Pub Date : 2022-11-21DOI: 10.1007/s42803-022-00053-8
Jeffrey R. Tharsen
{"title":"From form to sound 自形至聲: visual and aural representations of premodern Chinese phonology and phonorhetoric with applications for phonetic scripts","authors":"Jeffrey R. Tharsen","doi":"10.1007/s42803-022-00053-8","DOIUrl":"https://doi.org/10.1007/s42803-022-00053-8","url":null,"abstract":"","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"25 1","pages":"115-129"},"PeriodicalIF":0.0,"publicationDate":"2022-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73021365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-16DOI: 10.1007/s42803-022-00052-9
Caio Mello, Gullal S Cheema, Gaurish Thakkar
This study aims to present an approach for the challenges of working with Sentiment Analysis (SA) applied to news articles in a multilingual corpus. It looks at the use and combination of multiple algorithms to explore news articles published in English and Portuguese. It presents a methodology that starts by evaluating and combining four SA algorithms (SenticNet, SentiStrength, Vader and BERT, being BERT trained in two datasets) to improve the quality of outputs. A thorough review of the algorithms' limitations is conducted using SHAP, an explainable AI tool, resulting in a list of issues that researchers must consider before using SA to interpret texts. We propose a combination of the three best classifiers (Vader, Amazon BERT and Sent140 BERT) to identify contradictory results, improving the quality of the positive, neutral and negative labels assigned to the texts. Challenges with translation are addressed, indicating possible solutions for non-English corpora. As a case study, the method is applied to the study of the media coverage of London 2012 and Rio 2016 Olympic legacies. The combination of different classifiers has proved to be efficient, revealing the unbalance between the media coverage of London 2012, much more positive, and Rio 2016, more negative.
{"title":"Combining sentiment analysis classifiers to explore multilingual news articles covering London 2012 and Rio 2016 Olympics.","authors":"Caio Mello, Gullal S Cheema, Gaurish Thakkar","doi":"10.1007/s42803-022-00052-9","DOIUrl":"10.1007/s42803-022-00052-9","url":null,"abstract":"<p><p>This study aims to present an approach for the challenges of working with Sentiment Analysis (SA) applied to news articles in a multilingual corpus. It looks at the use and combination of multiple algorithms to explore news articles published in English and Portuguese. It presents a methodology that starts by evaluating and combining four SA algorithms (SenticNet, SentiStrength, Vader and BERT, being BERT trained in two datasets) to improve the quality of outputs. A thorough review of the algorithms' limitations is conducted using SHAP, an explainable AI tool, resulting in a list of issues that researchers must consider before using SA to interpret texts. We propose a combination of the three best classifiers (Vader, Amazon BERT and Sent140 BERT) to identify contradictory results, improving the quality of the positive, neutral and negative labels assigned to the texts. Challenges with translation are addressed, indicating possible solutions for non-English corpora. As a case study, the method is applied to the study of the media coverage of London 2012 and Rio 2016 Olympic legacies. The combination of different classifiers has proved to be efficient, revealing the unbalance between the media coverage of London 2012, much more positive, and Rio 2016, more negative.</p>","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":" ","pages":"1-27"},"PeriodicalIF":0.0,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9667437/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40504197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-10DOI: 10.1007/s42803-022-00054-7
Wenyi Shang, T. Underwood
{"title":"Civil service examination records and political independence in the autonomous northeastern region during the second half of the Tang dynasty (755–907 C.E.)","authors":"Wenyi Shang, T. Underwood","doi":"10.1007/s42803-022-00054-7","DOIUrl":"https://doi.org/10.1007/s42803-022-00054-7","url":null,"abstract":"","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"32 3 1","pages":"41-59"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77225769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-10DOI: 10.1007/s42803-022-00051-w
Erwan Moreau, Carl Vogel
{"title":"CLG Authorship Analytics: a library for authorship verification","authors":"Erwan Moreau, Carl Vogel","doi":"10.1007/s42803-022-00051-w","DOIUrl":"https://doi.org/10.1007/s42803-022-00051-w","url":null,"abstract":"","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"44 1","pages":"5 - 27"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77354375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-10DOI: 10.1007/s42803-022-00049-4
Ahmed Izzidien, Stephen Fitz, Peter Romero, Bao S Loe, David Stillwell
Fairness is a principal social value that is observable in civilisations around the world. Yet, a fairness metric for digital texts that describe even a simple social interaction, e.g., 'The boy hurt the girl' has not been developed. We address this by employing word embeddings that use factors found in a new social psychology literature review on the topic. We use these factors to build fairness vectors. These vectors are used as sentence level measures, whereby each dimension reflects a fairness component. The approach is employed to approximate human perceptions of fairness. The method leverages a pro-social bias within word embeddings, for which we obtain an F1 = 79.8 on a list of sentences using the Universal Sentence Encoder (USE). A second approach, using principal component analysis (PCA) and machine learning (ML), produces an F1 = 86.2. Repeating these tests using Sentence Bidirectional Encoder Representations from Transformers (SBERT) produces an F1 = 96.9 and F1 = 100 respectively. Improvements using subspace representations are further suggested. By proposing a first-principles approach, the paper contributes to the analysis of digital texts along an ethical dimension.
公平是一种主要的社会价值观,在世界各地的文明中都可以看到。然而,即使是描述简单社会互动(如 "男孩伤害了女孩")的数字文本,也尚未开发出公平度量标准。为了解决这个问题,我们采用了词嵌入技术,使用了在有关该主题的最新社会心理学文献综述中发现的因素。我们利用这些因素来构建公平性向量。这些向量被用作句子级别的衡量标准,其中每个维度都反映了公平性的组成部分。该方法可用于近似人类对公平性的感知。该方法利用了词嵌入中的亲社会偏差,我们在使用通用句子编码器(USE)的句子列表中获得了 F1 = 79.8 的结果。第二种方法使用主成分分析(PCA)和机器学习(ML),得出的 F1 = 86.2。使用来自变换器的句子双向编码器表示法(SBERT)重复这些测试,F1 = 96.9,F1 = 100。我们还提出了使用子空间表示法进行改进的建议。通过提出第一原理方法,本文为从伦理维度分析数字文本做出了贡献。
{"title":"Developing a sentence level fairness metric using word embeddings.","authors":"Ahmed Izzidien, Stephen Fitz, Peter Romero, Bao S Loe, David Stillwell","doi":"10.1007/s42803-022-00049-4","DOIUrl":"10.1007/s42803-022-00049-4","url":null,"abstract":"<p><p>Fairness is a principal social value that is observable in civilisations around the world. Yet, a fairness metric for digital texts that describe even a simple social interaction, e.g., 'The boy hurt the girl' has not been developed. We address this by employing word embeddings that use factors found in a new social psychology literature review on the topic. We use these factors to build fairness vectors. These vectors are used as sentence level measures, whereby each dimension reflects a fairness component. The approach is employed to approximate human perceptions of fairness. The method leverages a pro-social bias within word embeddings, for which we obtain an F1 = 79.8 on a list of sentences using the Universal Sentence Encoder (USE). A second approach, using principal component analysis (PCA) and machine learning (ML), produces an F1 = 86.2. Repeating these tests using Sentence Bidirectional Encoder Representations from Transformers (SBERT) produces an F1 = 96.9 and F1 = 100 respectively. Improvements using subspace representations are further suggested. By proposing a first-principles approach, the paper contributes to the analysis of digital texts along an ethical dimension.</p>","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":" ","pages":"1-36"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9549858/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33544721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-06DOI: 10.1007/s42803-022-00048-5
Shih-Pei Chen, Calvin Yeh, Sean Wang, Qun Che
{"title":"Treating a genre as a database: a digital research methodology for studying Chinese local gazetteers","authors":"Shih-Pei Chen, Calvin Yeh, Sean Wang, Qun Che","doi":"10.1007/s42803-022-00048-5","DOIUrl":"https://doi.org/10.1007/s42803-022-00048-5","url":null,"abstract":"","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"40 1","pages":"171-193"},"PeriodicalIF":0.0,"publicationDate":"2022-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76216058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-21DOI: 10.1007/s42803-022-00050-x
Tunç Yılmaz, Tatjana Scheffler
{"title":"Song authorship attribution: a lyrics and rhyme based approach","authors":"Tunç Yılmaz, Tatjana Scheffler","doi":"10.1007/s42803-022-00050-x","DOIUrl":"https://doi.org/10.1007/s42803-022-00050-x","url":null,"abstract":"","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"83 1","pages":"29 - 44"},"PeriodicalIF":0.0,"publicationDate":"2022-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75831349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-12DOI: 10.1007/s42803-022-00046-7
John Pavlopoulos, M. Konstantinidou
{"title":"Computational authorship analysis of the homeric poems","authors":"John Pavlopoulos, M. Konstantinidou","doi":"10.1007/s42803-022-00046-7","DOIUrl":"https://doi.org/10.1007/s42803-022-00046-7","url":null,"abstract":"","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"25 1","pages":"45 - 64"},"PeriodicalIF":0.0,"publicationDate":"2022-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84986511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-12DOI: 10.1007/s42803-022-00047-6
Stephen H. Whiteman
{"title":"On uncertain ground: lost landscapes, digital mediation, and site-based research at early Qing Chengde","authors":"Stephen H. Whiteman","doi":"10.1007/s42803-022-00047-6","DOIUrl":"https://doi.org/10.1007/s42803-022-00047-6","url":null,"abstract":"","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"20 1","pages":"1-35"},"PeriodicalIF":0.0,"publicationDate":"2022-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90048099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-30DOI: 10.1007/s42803-022-00045-8
N. Povroznik
{"title":"Web-history: designing a course, shaping the discipline","authors":"N. Povroznik","doi":"10.1007/s42803-022-00045-8","DOIUrl":"https://doi.org/10.1007/s42803-022-00045-8","url":null,"abstract":"","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"39 1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79880714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}