Pub Date : 2021-07-14DOI: 10.1007/s42001-021-00130-y
David Rozado, Musa al-Gharbi
{"title":"Using word embeddings to probe sentiment associations of politically loaded terms in news and opinion articles from news media outlets","authors":"David Rozado, Musa al-Gharbi","doi":"10.1007/s42001-021-00130-y","DOIUrl":"https://doi.org/10.1007/s42001-021-00130-y","url":null,"abstract":"","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"12 1","pages":"427 - 448"},"PeriodicalIF":3.2,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75811265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-01DOI: 10.1007/s42001-021-00132-w
Rachel Dinh, Patrick Gildersleve, Chris Blex, T. Yasseri
{"title":"Computational courtship understanding the evolution of online dating through large-scale data analysis","authors":"Rachel Dinh, Patrick Gildersleve, Chris Blex, T. Yasseri","doi":"10.1007/s42001-021-00132-w","DOIUrl":"https://doi.org/10.1007/s42001-021-00132-w","url":null,"abstract":"","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"25 1","pages":"401 - 426"},"PeriodicalIF":3.2,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76028367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Optical Character Recognition (OCR) can open up understudied historical documents to computational analysis, but the accuracy of OCR software varies. This article reports a benchmarking experiment comparing the performance of Tesseract, Amazon Textract, and Google Document AI on images of English and Arabic text. English-language book scans ( n = 322) and Arabic-language article scans ( n = 100) were replicated 43 times with different types of artificial noise for a corpus of 18,568 documents, generating 51,304 process requests. Document AI delivered the best results, and the server-based processors (Textract and Document AI) performed substantially better than Tesseract, especially on noisy documents. Accuracy for English was considerably higher than for Arabic. Specifying the relative performance of three leading OCR products and the differential effects of commonly found noise types can help scholars identify better OCR solutions for their research needs. The test materials have been preserved in the openly available “Noisy OCR Dataset” (NOD) for reuse in future benchmarking studies.
{"title":"OCR with Tesseract, Amazon Textract, and Google Document AI: a benchmarking experiment","authors":"Thomas Hegghammer","doi":"10.31235/osf.io/6zfvs","DOIUrl":"https://doi.org/10.31235/osf.io/6zfvs","url":null,"abstract":"Optical Character Recognition (OCR) can open up understudied historical documents to computational analysis, but the accuracy of OCR software varies. This article reports a benchmarking experiment comparing the performance of Tesseract, Amazon Textract, and Google Document AI on images of English and Arabic text. English-language book scans ( n = 322) and Arabic-language article scans ( n = 100) were replicated 43 times with different types of artificial noise for a corpus of 18,568 documents, generating 51,304 process requests. Document AI delivered the best results, and the server-based processors (Textract and Document AI) performed substantially better than Tesseract, especially on noisy documents. Accuracy for English was considerably higher than for Arabic. Specifying the relative performance of three leading OCR products and the differential effects of commonly found noise types can help scholars identify better OCR solutions for their research needs. The test materials have been preserved in the openly available “Noisy OCR Dataset” (NOD) for reuse in future benchmarking studies.","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"57 1","pages":"861-882"},"PeriodicalIF":3.2,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78057631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-18DOI: 10.1007/s42001-021-00123-x
C. Ellinas, C. Nicolaides, Naoki Masuda
{"title":"Mitigation strategies against cascading failures within a project activity network","authors":"C. Ellinas, C. Nicolaides, Naoki Masuda","doi":"10.1007/s42001-021-00123-x","DOIUrl":"https://doi.org/10.1007/s42001-021-00123-x","url":null,"abstract":"","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"10 1","pages":"383 - 400"},"PeriodicalIF":3.2,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85951292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-11DOI: 10.1007/S42001-021-00124-W
B. Klemens
{"title":"An analysis of US domestic migration via subset-stable measures of administrative data","authors":"B. Klemens","doi":"10.1007/S42001-021-00124-W","DOIUrl":"https://doi.org/10.1007/S42001-021-00124-W","url":null,"abstract":"","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"24 1","pages":"351-382"},"PeriodicalIF":3.2,"publicationDate":"2021-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84722138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-10DOI: 10.1007/s42001-021-00127-7
Jie Gu, Yunjie Xu
{"title":"Battle of positioning: exploring the role of bridges in competitive diffusion","authors":"Jie Gu, Yunjie Xu","doi":"10.1007/s42001-021-00127-7","DOIUrl":"https://doi.org/10.1007/s42001-021-00127-7","url":null,"abstract":"","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"26 1","pages":"319 - 350"},"PeriodicalIF":3.2,"publicationDate":"2021-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84383241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-08DOI: 10.1007/s42001-021-00125-9
A. Saxena, Harita Reddy
{"title":"Users roles identification on online crowdsourced Q&A platforms and encyclopedias: a survey","authors":"A. Saxena, Harita Reddy","doi":"10.1007/s42001-021-00125-9","DOIUrl":"https://doi.org/10.1007/s42001-021-00125-9","url":null,"abstract":"","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"208 1","pages":"285 - 317"},"PeriodicalIF":3.2,"publicationDate":"2021-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80548578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-07DOI: 10.1007/s42001-021-00126-8
J. Saxon, Julia Koschinsky, Karina Acosta, V. Anguiano, L. Anselin, Sergio J. Rey
{"title":"An open software environment to make spatial access metrics more accessible","authors":"J. Saxon, Julia Koschinsky, Karina Acosta, V. Anguiano, L. Anselin, Sergio J. Rey","doi":"10.1007/s42001-021-00126-8","DOIUrl":"https://doi.org/10.1007/s42001-021-00126-8","url":null,"abstract":"","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"95 1 1","pages":"265 - 284"},"PeriodicalIF":3.2,"publicationDate":"2021-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83350158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-06DOI: 10.1007/s42001-021-00128-6
Miguel Won, Jorge M. Fernandes
{"title":"Analyzing Twitter networks using graph embeddings: an application to the British case","authors":"Miguel Won, Jorge M. Fernandes","doi":"10.1007/s42001-021-00128-6","DOIUrl":"https://doi.org/10.1007/s42001-021-00128-6","url":null,"abstract":"","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"16 1","pages":"253 - 263"},"PeriodicalIF":3.2,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78522235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-31DOI: 10.1007/s42001-021-00119-7
J. Clelland, H. Colgate, Daryl R. DeFord, Beth Malmskog, Flavia Sancier-Barbosa
{"title":"Colorado in context: Congressional redistricting and competing fairness criteria in Colorado","authors":"J. Clelland, H. Colgate, Daryl R. DeFord, Beth Malmskog, Flavia Sancier-Barbosa","doi":"10.1007/s42001-021-00119-7","DOIUrl":"https://doi.org/10.1007/s42001-021-00119-7","url":null,"abstract":"","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"14 1","pages":"189 - 226"},"PeriodicalIF":3.2,"publicationDate":"2021-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73643669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}