In this paper, we propose a precise, comprehensive model of table processing which aims to remedy some of the problems in the discussion of table processing in the literature. The model targets application-independent, end-to-end table processing, and thus encompasses a large subset of the work in the area. The model can be used to aid the design of table processing systems (We provide an example of such a system), can be considered as a reference framework for evaluating the performance of table processing systems, and can assist in clarifying terminological differences in the table processing literature.
{"title":"TEXUS: A Task-based Approach for Table Extraction and Understanding","authors":"Roya Rastan, Hye-young Paik, J. Shepherd","doi":"10.1145/2682571.2797069","DOIUrl":"https://doi.org/10.1145/2682571.2797069","url":null,"abstract":"In this paper, we propose a precise, comprehensive model of table processing which aims to remedy some of the problems in the discussion of table processing in the literature. The model targets application-independent, end-to-end table processing, and thus encompasses a large subset of the work in the area. The model can be used to aid the design of table processing systems (We provide an example of such a system), can be considered as a reference framework for evaluating the performance of table processing systems, and can assist in clarifying terminological differences in the table processing literature.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127092623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linked data technologies make it possible to publish and link structured data on the Web. Although RDF is not about text, many RDF data providers publish their data in their own language. Cross-lingual interlinking aims at discovering links between identical resources across knowledge bases in different languages. In this paper, we present a method for interlinking RDF resources described in English and Chinese using the BabelNet multilingual lexicon. Resources are represented as vectors of identifiers and then similarity between these resources is computed. The method achieves an F-measure of 88%. The results are also compared to a translation-based method.
{"title":"Interlinking English and Chinese RDF Data Using BabelNet","authors":"Tatiana Lesnikova, Jérôme David, J. Euzenat","doi":"10.1145/2682571.2797089","DOIUrl":"https://doi.org/10.1145/2682571.2797089","url":null,"abstract":"Linked data technologies make it possible to publish and link structured data on the Web. Although RDF is not about text, many RDF data providers publish their data in their own language. Cross-lingual interlinking aims at discovering links between identical resources across knowledge bases in different languages. In this paper, we present a method for interlinking RDF resources described in English and Chinese using the BabelNet multilingual lexicon. Resources are represented as vectors of identifiers and then similarity between these resources is computed. The method achieves an F-measure of 88%. The results are also compared to a translation-based method.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132725452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a new model for presenting graphics in eBooks to blind readers. It is based on the GraViewer app which allows an accessible graphic embedded in an iBook to be explored on an iPad using speech and non-speech audio feedback. We also introduce a web-based tool, GraAuthor, for creating such accessible graphics and describe the workflow for including these in an iBook. Unlike previous approaches our model provides an integrated digital presentation of both text and graphics and allows the general public to create accessible graphics.
{"title":"Creating eBooks with Accessible Graphics Content","authors":"Cagatay Goncu, K. Marriott","doi":"10.1145/2682571.2797076","DOIUrl":"https://doi.org/10.1145/2682571.2797076","url":null,"abstract":"We present a new model for presenting graphics in eBooks to blind readers. It is based on the GraViewer app which allows an accessible graphic embedded in an iBook to be explored on an iPad using speech and non-speech audio feedback. We also introduce a web-based tool, GraAuthor, for creating such accessible graphics and describe the workflow for including these in an iBook. Unlike previous approaches our model provides an integrated digital presentation of both text and graphics and allows the general public to create accessible graphics.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115758238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In general, schemas of XML documents are continuously updated according to changes in the real world. If a schema is updated, then XSLT stylesheets are also affected by the schema update. To maintain the consistencies of XSLT stylesheets with updated schemas, we have to detect the XSLT rules affected by schema updates. However, detecting such XSLT rules manually is a difficult and time-consuming task, since recent DTDs and XSLT stylesheets are becoming more complex and users do not always fully understand the dependencies between XSLT stylesheets and DTDs. In this paper, we consider three subclasses based on unranked tree transducer, and consider an algorithm for detecting XSLT rules affected by a DTD update for the classes.
{"title":"Detecting XSLT Rules Affected by Schema Evolution","authors":"Yang Wu, Nobutaka Suzuki","doi":"10.1145/2682571.2797086","DOIUrl":"https://doi.org/10.1145/2682571.2797086","url":null,"abstract":"In general, schemas of XML documents are continuously updated according to changes in the real world. If a schema is updated, then XSLT stylesheets are also affected by the schema update. To maintain the consistencies of XSLT stylesheets with updated schemas, we have to detect the XSLT rules affected by schema updates. However, detecting such XSLT rules manually is a difficult and time-consuming task, since recent DTDs and XSLT stylesheets are becoming more complex and users do not always fully understand the dependencies between XSLT stylesheets and DTDs. In this paper, we consider three subclasses based on unranked tree transducer, and consider an algorithm for detecting XSLT rules affected by a DTD update for the classes.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115799387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Information Summarized","authors":"D. Brailsford","doi":"10.1145/3256803","DOIUrl":"https://doi.org/10.1145/3256803","url":null,"abstract":"","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116958419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Logical Structures","authors":"E. Munson","doi":"10.1145/3256807","DOIUrl":"https://doi.org/10.1145/3256807","url":null,"abstract":"","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121919904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Linked Data initiative has made it possible for the web to evolve from being a global information space in which only documents are linked to one in which both documents and data are linked: a web of documents and data. This tutorial aims to give an overview of the principles, models and technologies underlying Linked Data.
{"title":"What Is This Thing Called Linked Data?","authors":"Manuel Atencia, Jérôme David, P. Genoud","doi":"10.1145/2682571.2801035","DOIUrl":"https://doi.org/10.1145/2682571.2801035","url":null,"abstract":"The Linked Data initiative has made it possible for the web to evolve from being a global information space in which only documents are linked to one in which both documents and data are linked: a web of documents and data. This tutorial aims to give an overview of the principles, models and technologies underlying Linked Data.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130917523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Mei, Xinxin Kou, Zhimin Yao, A. Rau-Chaplin, Aminul Islam, A. Mohammad, E. Milios
Measuring document relatedness using unsupervised co-occurrence based word relatedness methods is a processing-time and memory consuming task. This paper introduces the application of compact data structures for efficient computation of word relatedness based on corpus statistics. The data structure is used to efficiently lookup: (1) the corpus statistics for the Common Word Relatedness Approach, (2) the pairwise word relatedness for the Algorithm Specific Word Relatedness Approach. These two approaches significantly accelerate the processing time of word relatedness methods and reduce the space cost of storing co-occurrence statistics in memory, making text mining tasks like classification and clustering based on word relatedness practical.
{"title":"Efficient Computation of Co-occurrence Based Word Relatedness","authors":"Jie Mei, Xinxin Kou, Zhimin Yao, A. Rau-Chaplin, Aminul Islam, A. Mohammad, E. Milios","doi":"10.1145/2682571.2797088","DOIUrl":"https://doi.org/10.1145/2682571.2797088","url":null,"abstract":"Measuring document relatedness using unsupervised co-occurrence based word relatedness methods is a processing-time and memory consuming task. This paper introduces the application of compact data structures for efficient computation of word relatedness based on corpus statistics. The data structure is used to efficiently lookup: (1) the corpus statistics for the Common Word Relatedness Approach, (2) the pairwise word relatedness for the Algorithm Specific Word Relatedness Approach. These two approaches significantly accelerate the processing time of word relatedness methods and reduce the space cost of storing co-occurrence statistics in memory, making text mining tasks like classification and clustering based on word relatedness practical.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128091425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chen Liang, Shuting Wang, Zhaohui Wu, Kyle Williams, B. Pursel, Benjamin Bräutigam, Sherwyn Saul, Hannah Williams, Kyle Bowen, C. Lee Giles
As more educational resources become available online, it is possible to acquire more up-to-date knowledge and information. We propose BBookX, a novel computer facilitated system that automatically and collaboratively builds free open online books using publicly available educational resources such as Wikipedia. BBookX has two separate components: one creates an open version of existing books by linking different book chapters to Wikipedia articles, while another with an interactive user interface supports interactive real-time book creation where users are allowed to modify a generated book from explicit feedback.
{"title":"BBookX: An Automatic Book Creation Framework","authors":"Chen Liang, Shuting Wang, Zhaohui Wu, Kyle Williams, B. Pursel, Benjamin Bräutigam, Sherwyn Saul, Hannah Williams, Kyle Bowen, C. Lee Giles","doi":"10.1145/2682571.2797094","DOIUrl":"https://doi.org/10.1145/2682571.2797094","url":null,"abstract":"As more educational resources become available online, it is possible to acquire more up-to-date knowledge and information. We propose BBookX, a novel computer facilitated system that automatically and collaboratively builds free open online books using publicly available educational resources such as Wikipedia. BBookX has two separate components: one creates an open version of existing books by linking different book chapters to Wikipedia articles, while another with an interactive user interface supports interactive real-time book creation where users are allowed to modify a generated book from explicit feedback.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"289 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117292670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
There is an inherent flexibility in typesetting a block of text. Traditionally, line breaks would be manually chosen at strategic points in such a way as to minimize the amount of whitespace in each line. Hyphenation would only be used as a last resort. Knuth and Plass automated this optimization procedure, which has been used in various typesetting systems and DTP applications ever since. However, an optimal solution for the line-breaking problem does not necessarily lead us to an optimal document layout on the whole. The flexibility of choosing line breaks enables us, in many cases, to adjust the height of a paragraph by changing the number of lines, without having to make adjustments to font size, leading, etc. In many cases, the word spacing remains within the usual tolerances and visual quality does not noticeably suffer. This paper presents a modification to the Knuth-Plass algorithm to return several results for a given column of text, each corresponding to a different height, and describes steps to quantify the amount of expected flexibility in a given paragraph. We conclude with a discussion on how such "sub-optimal" results can lead to a better overall document layout, particularly in the context of mobile layouts, where flexibility is of key importance.
{"title":"Knuth-Plass Revisited: Flexible Line-Breaking for Automatic Document Layout","authors":"Tamir Hassan, Andrew Hunter","doi":"10.1145/2682571.2797091","DOIUrl":"https://doi.org/10.1145/2682571.2797091","url":null,"abstract":"There is an inherent flexibility in typesetting a block of text. Traditionally, line breaks would be manually chosen at strategic points in such a way as to minimize the amount of whitespace in each line. Hyphenation would only be used as a last resort. Knuth and Plass automated this optimization procedure, which has been used in various typesetting systems and DTP applications ever since. However, an optimal solution for the line-breaking problem does not necessarily lead us to an optimal document layout on the whole. The flexibility of choosing line breaks enables us, in many cases, to adjust the height of a paragraph by changing the number of lines, without having to make adjustments to font size, leading, etc. In many cases, the word spacing remains within the usual tolerances and visual quality does not noticeably suffer. This paper presents a modification to the Knuth-Plass algorithm to return several results for a given column of text, each corresponding to a different height, and describes steps to quantify the amount of expected flexibility in a given paragraph. We conclude with a discussion on how such \"sub-optimal\" results can lead to a better overall document layout, particularly in the context of mobile layouts, where flexibility is of key importance.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124945973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}