D. Archer, Merja Kytö, Alistair Baron, Paul Rayson
Abstract Corpora of Early Modern English have been collected and released for research for a number of years. With large scale digitisation activities gathering pace in the last decade, much more historical textual data is now available for research on numerous topics including historical linguistics and conceptual history. We summarise previous research which has shown that it is necessary to map historical spelling variants to modern equivalents in order to successfully apply natural language processing and corpus linguistics methods. Manual and semiautomatic methods have been devised to support this normalisation and standardisation process. We argue that it is important to develop a linguistically meaningful rationale to achieve good results from this process. In order to do so, we propose a number of guidelines for normalising corpora and show how these guidelines have been applied in the Corpus of English Dialogues.
{"title":"Guidelines for normalising Early Modern English corpora: Decisions and justifications","authors":"D. Archer, Merja Kytö, Alistair Baron, Paul Rayson","doi":"10.1515/icame-2015-0001","DOIUrl":"https://doi.org/10.1515/icame-2015-0001","url":null,"abstract":"Abstract Corpora of Early Modern English have been collected and released for research for a number of years. With large scale digitisation activities gathering pace in the last decade, much more historical textual data is now available for research on numerous topics including historical linguistics and conceptual history. We summarise previous research which has shown that it is necessary to map historical spelling variants to modern equivalents in order to successfully apply natural language processing and corpus linguistics methods. Manual and semiautomatic methods have been devised to support this normalisation and standardisation process. We argue that it is important to develop a linguistically meaningful rationale to achieve good results from this process. In order to do so, we propose a number of guidelines for normalising corpora and show how these guidelines have been applied in the Corpus of English Dialogues.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"68 1","pages":"24 - 5"},"PeriodicalIF":0.0,"publicationDate":"2015-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79966460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"David L. Hoover, Jonathan Culpeper and Kieran O’Halloran. Digital literary studies","authors":"Jesse Egbert","doi":"10.1515/icame-2015-0009","DOIUrl":"https://doi.org/10.1515/icame-2015-0009","url":null,"abstract":"","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"63 1","pages":"152 - 156"},"PeriodicalIF":0.0,"publicationDate":"2015-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80998285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lieven Vandelanotte, Kristin Davidse, Caroline Gentens and Ditte Kimps (eds.). Recent advances in corpus linguistics. Developing and exploiting corpora","authors":"Leonie Wiemeyer","doi":"10.1515/icame-2015-0012","DOIUrl":"https://doi.org/10.1515/icame-2015-0012","url":null,"abstract":"","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"2 1","pages":"171 - 177"},"PeriodicalIF":0.0,"publicationDate":"2015-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86627118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christoph Rühlemann, Andrej Bagoutdinov, M. O'Donnell
Abstract This paper outlines a modest approach to XPath and XQuery, tools allowing the navigation and exploitation of XML-encoded texts. The paper starts off from where Andrew Hardie’s paper “Modest XML for corpora: Not a standard, but a suggestion” (Hardie 2014) left the reader, namely wondering how one’s corpus can be usefully analyzed once its XML-encoding is finished, a question the paper did not address. Hardie argued persuasively that “there is a clear benefit to be had from a set of recommendations (not a standard) that outlines general best practices in the use of XML in corpora without going into any of the more technical aspects of XML or the full weight of TEI encoding” (Hardie 2014: 73). In a similar vein this paper argues that even a basic understanding of XPath and XQuery can bring great benefits to corpus linguists. To make this point, we present not only a modest introduction to basic structures underlying the XPath and XQuery syntax but demonstrate their analytical potential using Obama’s 2009 Inaugural Address as a test bed. The speech was encoded in XML, automatically PoS-tagged and manually annotated on additional layers that target two rhetorical figures, anaphora and isocola. We refer to this resource as the Inaugural Rhetorical Corpus (IRC). Further, we created a companion website hosting not only the Inaugural Rhetorical Corpus, but also the Inaugural Training Corpus) (a training corpus in the form of an abbreviated version of the IRC to allow manual checks of query results) as well as an extensive list of tried and tested queries for use with either corpus. All of the queries presented in this paper are at beginners to lower-intermediate levels of XPath/XQuery expertise. Nonetheless, they yield fruitful results: they show how Obama uses the inclusive pronouns we and our as a discursive strategy to advance his political strategy to re-focus American politics on economic and domestic matters. Further, they demonstrate how sentence length contributes to the build-up of climactic tension. Finally, they suggest that Obama’s signature rhetorical figure is the isocolon and that the overwhelming majority of isocola in the speech instantiate the crescens type, where the cola gradually increase in length over the sequence.
{"title":"Modest XPath and XQuery for corpora: Exploiting deep XML annotation","authors":"Christoph Rühlemann, Andrej Bagoutdinov, M. O'Donnell","doi":"10.1515/icame-2015-0003","DOIUrl":"https://doi.org/10.1515/icame-2015-0003","url":null,"abstract":"Abstract This paper outlines a modest approach to XPath and XQuery, tools allowing the navigation and exploitation of XML-encoded texts. The paper starts off from where Andrew Hardie’s paper “Modest XML for corpora: Not a standard, but a suggestion” (Hardie 2014) left the reader, namely wondering how one’s corpus can be usefully analyzed once its XML-encoding is finished, a question the paper did not address. Hardie argued persuasively that “there is a clear benefit to be had from a set of recommendations (not a standard) that outlines general best practices in the use of XML in corpora without going into any of the more technical aspects of XML or the full weight of TEI encoding” (Hardie 2014: 73). In a similar vein this paper argues that even a basic understanding of XPath and XQuery can bring great benefits to corpus linguists. To make this point, we present not only a modest introduction to basic structures underlying the XPath and XQuery syntax but demonstrate their analytical potential using Obama’s 2009 Inaugural Address as a test bed. The speech was encoded in XML, automatically PoS-tagged and manually annotated on additional layers that target two rhetorical figures, anaphora and isocola. We refer to this resource as the Inaugural Rhetorical Corpus (IRC). Further, we created a companion website hosting not only the Inaugural Rhetorical Corpus, but also the Inaugural Training Corpus) (a training corpus in the form of an abbreviated version of the IRC to allow manual checks of query results) as well as an extensive list of tried and tested queries for use with either corpus. All of the queries presented in this paper are at beginners to lower-intermediate levels of XPath/XQuery expertise. Nonetheless, they yield fruitful results: they show how Obama uses the inclusive pronouns we and our as a discursive strategy to advance his political strategy to re-focus American politics on economic and domestic matters. Further, they demonstrate how sentence length contributes to the build-up of climactic tension. Finally, they suggest that Obama’s signature rhetorical figure is the isocolon and that the overwhelming majority of isocola in the speech instantiate the crescens type, where the cola gradually increase in length over the sequence.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"3 1","pages":"47 - 84"},"PeriodicalIF":0.0,"publicationDate":"2015-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75259741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract This study involved the creation of a corpus of children’s literature spanning 5.5 million words. Using concordance software, the corpus was able to show the most frequent words and collocations. These will be of interest both to literary researchers in the genre of children’s literature and also teachers and applied linguists working with adult students of English.
{"title":"Word frequency and collocation: Using children’s literature in adult learning","authors":"E. Thomas","doi":"10.1515/icame-2015-0004","DOIUrl":"https://doi.org/10.1515/icame-2015-0004","url":null,"abstract":"Abstract This study involved the creation of a corpus of children’s literature spanning 5.5 million words. Using concordance software, the corpus was able to show the most frequent words and collocations. These will be of interest both to literary researchers in the genre of children’s literature and also teachers and applied linguists working with adult students of English.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"206 1","pages":"110 - 85"},"PeriodicalIF":0.0,"publicationDate":"2015-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73626032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tony Berber Sardinha and Marcia Veirano Pinto (eds.). Multi-Dimensional analysis, 25 years on – a tribute to Douglas Biber","authors":"Marco Schilk","doi":"10.1515/icame-2015-0006","DOIUrl":"https://doi.org/10.1515/icame-2015-0006","url":null,"abstract":"","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"69 1","pages":"135 - 139"},"PeriodicalIF":0.0,"publicationDate":"2015-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84593386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Claudia Claridge. Hyperbole in English: A corpus-based study of exaggeration","authors":"M. Levin","doi":"10.1515/icame-2015-0007","DOIUrl":"https://doi.org/10.1515/icame-2015-0007","url":null,"abstract":"","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"7 1","pages":"140 - 144"},"PeriodicalIF":0.0,"publicationDate":"2015-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76065378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Christoph Rühlemann. Narrative in English conversation","authors":"A. Partington","doi":"10.1515/icame-2015-0011","DOIUrl":"https://doi.org/10.1515/icame-2015-0011","url":null,"abstract":"","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"15 1","pages":"164 - 170"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87046952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Review: Sandra Götz. Fluency in native and nonnative English speech","authors":"P. Pérez-Paredes","doi":"10.2478/icame-2014-0013","DOIUrl":"https://doi.org/10.2478/icame-2014-0013","url":null,"abstract":"","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"0 1","pages":"183 - 187"},"PeriodicalIF":0.0,"publicationDate":"2014-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88580228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}