Abstract Purpose This paper aims to assess if the extent of openness and the coverage of data sets released by European governments have a significant impact on citizen trust in public institutions. Design/methodology/approach Data for openness and coverage have been collected from the Open Data Inventory 2018 (ODIN), by Open Data Watch; institutional trust is built up as a formative construct based on the European Social Survey (ESS), Round 9. The relations between the open government data features and trust have been tested on the basis of structural equation modelling (SEM). Findings The paper reveals that as European governments improve data openness, disaggregation, and time coverage, people tend to trust them more. However, the size of the effect is still small and, comparatively, data coverage effect on citizens’ confidence is more than twice than the impact of openness. Research limitations This paper analyzes the causal effect of Open Government Data (OGD) features captured in a certain moment of time. In upcoming years, as OGD is implemented and a more consistent effect on people is expected, time series analysis will provide with a deeper insight. Practical implications Public officers should continue working in the development of a technological framework that contributes to make OGD truly open. They should improve the added value of the increasing amount of open data currently available in order to boost internal and external innovations valuable both for public agencies and citizens. Originality/value In a field of knowledge with little quantitative empirical evidence, this paper provides updated support for the positive effect of OGD strategies and it also points out areas of improvement in terms of the value that citizens can get from OGD coverage and openness.
{"title":"Government Data Openness and Coverage. How do They Affect Trust in European Countries?","authors":"Nicolás Gonzálvez-Gallego, Laura Nieto-Torrejón","doi":"10.2478/jdis-2021-0010","DOIUrl":"https://doi.org/10.2478/jdis-2021-0010","url":null,"abstract":"Abstract Purpose This paper aims to assess if the extent of openness and the coverage of data sets released by European governments have a significant impact on citizen trust in public institutions. Design/methodology/approach Data for openness and coverage have been collected from the Open Data Inventory 2018 (ODIN), by Open Data Watch; institutional trust is built up as a formative construct based on the European Social Survey (ESS), Round 9. The relations between the open government data features and trust have been tested on the basis of structural equation modelling (SEM). Findings The paper reveals that as European governments improve data openness, disaggregation, and time coverage, people tend to trust them more. However, the size of the effect is still small and, comparatively, data coverage effect on citizens’ confidence is more than twice than the impact of openness. Research limitations This paper analyzes the causal effect of Open Government Data (OGD) features captured in a certain moment of time. In upcoming years, as OGD is implemented and a more consistent effect on people is expected, time series analysis will provide with a deeper insight. Practical implications Public officers should continue working in the development of a technological framework that contributes to make OGD truly open. They should improve the added value of the increasing amount of open data currently available in order to boost internal and external innovations valuable both for public agencies and citizens. Originality/value In a field of knowledge with little quantitative empirical evidence, this paper provides updated support for the positive effect of OGD strategies and it also points out areas of improvement in terms of the value that citizens can get from OGD coverage and openness.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"139 - 153"},"PeriodicalIF":0.0,"publicationDate":"2021-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41598737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose This paper aims to improve the classification performance when the data is imbalanced by applying different sampling techniques available in Machine Learning. Design/methodology/approach The medical appointment no-show dataset is imbalanced, and when classification algorithms are applied directly to the dataset, it is biased towards the majority class, ignoring the minority class. To avoid this issue, multiple sampling techniques such as Random Over Sampling (ROS), Random Under Sampling (RUS), Synthetic Minority Oversampling TEchnique (SMOTE), ADAptive SYNthetic Sampling (ADASYN), Edited Nearest Neighbor (ENN), and Condensed Nearest Neighbor (CNN) are applied in order to make the dataset balanced. The performance is assessed by the Decision Tree classifier with the listed sampling techniques and the best performance is identified. Findings This study focuses on the comparison of the performance metrics of various sampling methods widely used. It is revealed that, compared to other techniques, the Recall is high when ENN is applied CNN and ADASYN have performed equally well on the Imbalanced data. Research limitations The testing was carried out with limited dataset and needs to be tested with a larger dataset. Practical implications This framework will be useful whenever the data is imbalanced in real world scenarios, which ultimately improves the performance. Originality/value This paper uses the rebalancing framework on medical appointment no-show dataset to predict the no-shows and removes the bias towards minority class.
{"title":"A Rebalancing Framework for Classification of Imbalanced Medical Appointment No-show Data","authors":"Ulagapriya Krishnan, Pushpa Sangar","doi":"10.2478/jdis-2021-0011","DOIUrl":"https://doi.org/10.2478/jdis-2021-0011","url":null,"abstract":"Abstract Purpose This paper aims to improve the classification performance when the data is imbalanced by applying different sampling techniques available in Machine Learning. Design/methodology/approach The medical appointment no-show dataset is imbalanced, and when classification algorithms are applied directly to the dataset, it is biased towards the majority class, ignoring the minority class. To avoid this issue, multiple sampling techniques such as Random Over Sampling (ROS), Random Under Sampling (RUS), Synthetic Minority Oversampling TEchnique (SMOTE), ADAptive SYNthetic Sampling (ADASYN), Edited Nearest Neighbor (ENN), and Condensed Nearest Neighbor (CNN) are applied in order to make the dataset balanced. The performance is assessed by the Decision Tree classifier with the listed sampling techniques and the best performance is identified. Findings This study focuses on the comparison of the performance metrics of various sampling methods widely used. It is revealed that, compared to other techniques, the Recall is high when ENN is applied CNN and ADASYN have performed equally well on the Imbalanced data. Research limitations The testing was carried out with limited dataset and needs to be tested with a larger dataset. Practical implications This framework will be useful whenever the data is imbalanced in real world scenarios, which ultimately improves the performance. Originality/value This paper uses the rebalancing framework on medical appointment no-show dataset to predict the no-shows and removes the bias towards minority class.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"178 - 192"},"PeriodicalIF":0.0,"publicationDate":"2021-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48940624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose The purpose of this exploratory study is to provide modern local governments with potential use cases for their open data, in order to help inform related future policies and decision-making. The concrete context was that of the Växjö municipality located in southeastern Sweden. Design/methodology/approach The methodology was two-fold: 1) a survey of potential end users (n=151) from a local university; and, 2) analysis of survey results using a theoretical model regarding local strategies for implementing open government data. Findings Most datasets predicted to be useful were on: sustainability and environment; preschool and school; municipality and politics. The use context given is primarily research and development, informing policies and decision making; but also education, informing personal choices, informing citizens and creating services based on open data. Not the least, the need for educating target user groups on data literacy emerged. A tentative pattern comprising a technical perspective on open data and a social perspective on open government was identified. Research limitations In line with available funding, the nature of the study was exploratory and implemented as an anonymous web-based survey of employees and students at the local university. Further research involving (qualitative) surveys with all stakeholders would allow for creating a more complete picture of the matter. Practical implications The study determines potential use cases and use contexts for open government data, in order to help inform related future policies and decision-making. Originality/value Modern local governments, and especially in Sweden, are faced with a challenge of how to make their data open, how to learn about which types of data will be most relevant for their end users and what will be different societal purposes. The paper contributes to knowledge that modern local governments can resort to when it comes to attitudes of local citizens to open government data in the context of an open government data perspective.
{"title":"Why Open Government Data? The Case of a Swedish Municipality","authors":"Koraljka Golub, Arwid Lund","doi":"10.2478/jdis-2021-0012","DOIUrl":"https://doi.org/10.2478/jdis-2021-0012","url":null,"abstract":"Abstract Purpose The purpose of this exploratory study is to provide modern local governments with potential use cases for their open data, in order to help inform related future policies and decision-making. The concrete context was that of the Växjö municipality located in southeastern Sweden. Design/methodology/approach The methodology was two-fold: 1) a survey of potential end users (n=151) from a local university; and, 2) analysis of survey results using a theoretical model regarding local strategies for implementing open government data. Findings Most datasets predicted to be useful were on: sustainability and environment; preschool and school; municipality and politics. The use context given is primarily research and development, informing policies and decision making; but also education, informing personal choices, informing citizens and creating services based on open data. Not the least, the need for educating target user groups on data literacy emerged. A tentative pattern comprising a technical perspective on open data and a social perspective on open government was identified. Research limitations In line with available funding, the nature of the study was exploratory and implemented as an anonymous web-based survey of employees and students at the local university. Further research involving (qualitative) surveys with all stakeholders would allow for creating a more complete picture of the matter. Practical implications The study determines potential use cases and use contexts for open government data, in order to help inform related future policies and decision-making. Originality/value Modern local governments, and especially in Sweden, are faced with a challenge of how to make their data open, how to learn about which types of data will be most relevant for their end users and what will be different societal purposes. The paper contributes to knowledge that modern local governments can resort to when it comes to attitudes of local citizens to open government data in the context of an open government data perspective.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"120 - 138"},"PeriodicalIF":0.0,"publicationDate":"2021-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41856980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"New Editorial Board Announced for Journal of Data and Information Science","authors":"","doi":"10.2478/jdis-2021-0026","DOIUrl":"https://doi.org/10.2478/jdis-2021-0026","url":null,"abstract":"","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"1 1","pages":"164-165"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83059511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tingting Chen, Guopeng Li, Qiping Deng, Xiaomei Wang
Abstract Purpose The goal of this study is to explore whether deep learning based embedded models can provide a better visualization solution for large citation networks. Design/methodology/approach Our team compared the visualization approach borrowed from the deep learning community with the well-known bibliometric network visualization for large scale data. 47,294 highly cited papers were visualized by using three network embedding models plus the t-SNE dimensionality reduction technique. Besides, three base maps were created with the same dataset for evaluation purposes. All base maps used the classic OpenOrd method with different edge cutting strategies and parameters. Findings The network embedded maps with t-SNE preserve a very similar global structure to the full edges classic force-directed map, while the maps vary in local structure. Among them, the Node2Vec model has the best overall visualization performance, the local structure has been significantly improved and the maps’ layout has very high stability. Research limitations The computational and time costs of training are very high for network embedded models to obtain high dimensional latent vector. Only one dimensionality reduction technique was tested. Practical implications This paper demonstrates that the network embedding models are able to accurately reconstruct the large bibliometric network in the vector space. In the future, apart from network visualization, many classical vector-based machine learning algorithms can be applied to network representations for solving bibliometric analysis tasks. Originality/value This paper provides the first systematic comparison of classical science mapping visualization with network embedding based visualization on a large scale dataset. We showed deep learning based network embedding model with t-SNE can provide a richer, more stable science map. We also designed a practical evaluation method to investigate and compare maps.
{"title":"Using Network Embedding to Obtain a Richer and More Stable Network Layout for a Large Scale Bibliometric Network","authors":"Tingting Chen, Guopeng Li, Qiping Deng, Xiaomei Wang","doi":"10.2478/jdis-2021-0006","DOIUrl":"https://doi.org/10.2478/jdis-2021-0006","url":null,"abstract":"Abstract Purpose The goal of this study is to explore whether deep learning based embedded models can provide a better visualization solution for large citation networks. Design/methodology/approach Our team compared the visualization approach borrowed from the deep learning community with the well-known bibliometric network visualization for large scale data. 47,294 highly cited papers were visualized by using three network embedding models plus the t-SNE dimensionality reduction technique. Besides, three base maps were created with the same dataset for evaluation purposes. All base maps used the classic OpenOrd method with different edge cutting strategies and parameters. Findings The network embedded maps with t-SNE preserve a very similar global structure to the full edges classic force-directed map, while the maps vary in local structure. Among them, the Node2Vec model has the best overall visualization performance, the local structure has been significantly improved and the maps’ layout has very high stability. Research limitations The computational and time costs of training are very high for network embedded models to obtain high dimensional latent vector. Only one dimensionality reduction technique was tested. Practical implications This paper demonstrates that the network embedding models are able to accurately reconstruct the large bibliometric network in the vector space. In the future, apart from network visualization, many classical vector-based machine learning algorithms can be applied to network representations for solving bibliometric analysis tasks. Originality/value This paper provides the first systematic comparison of classical science mapping visualization with network embedding based visualization on a large scale dataset. We showed deep learning based network embedding model with t-SNE can provide a richer, more stable science map. We also designed a practical evaluation method to investigate and compare maps.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"154 - 177"},"PeriodicalIF":0.0,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42221929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Thelwall, Meiko Makita, Amalia Más-Bleda, E. Stuart
Abstract Purpose Attention deficit hyperactivity disorder (ADHD) is a common behavioural condition. This article introduces a new data science method, word association thematic analysis, to investigate whether ADHD tweets can give insights into patient concerns and online communication needs. Design/methodology/approach Tweets matching “my ADHD” (n=58,893) and 99 other conditions (n=1,341,442) were gathered and two thematic analyses conducted. Analysis 1: A standard thematic analysis of ADHD-related tweets. Analysis 2: A word association thematic analysis of themes unique to ADHD. Findings The themes that emerged from the two analyses included people ascribing their brains agency to explain and justify their symptoms and using the concept of neurodivergence for a positive self-image. Research limitations This is a single case study and the results may differ for other topics. Practical implications Health professionals should be sensitive to patients’ needs to understand their behaviour, find ways to justify and explain it to others and to be positive about their condition. Originality/value Word association thematic analysis can give new insights into the (self-reported) patient perspective.
{"title":"“My ADHD Hellbrain”: A Twitter Data Science Perspective on a Behavioural Disorder","authors":"M. Thelwall, Meiko Makita, Amalia Más-Bleda, E. Stuart","doi":"10.2478/jdis-2021-0007","DOIUrl":"https://doi.org/10.2478/jdis-2021-0007","url":null,"abstract":"Abstract Purpose Attention deficit hyperactivity disorder (ADHD) is a common behavioural condition. This article introduces a new data science method, word association thematic analysis, to investigate whether ADHD tweets can give insights into patient concerns and online communication needs. Design/methodology/approach Tweets matching “my ADHD” (n=58,893) and 99 other conditions (n=1,341,442) were gathered and two thematic analyses conducted. Analysis 1: A standard thematic analysis of ADHD-related tweets. Analysis 2: A word association thematic analysis of themes unique to ADHD. Findings The themes that emerged from the two analyses included people ascribing their brains agency to explain and justify their symptoms and using the concept of neurodivergence for a positive self-image. Research limitations This is a single case study and the results may differ for other topics. Practical implications Health professionals should be sensitive to patients’ needs to understand their behaviour, find ways to justify and explain it to others and to be positive about their condition. Originality/value Word association thematic analysis can give new insights into the (self-reported) patient perspective.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"13 - 34"},"PeriodicalIF":0.0,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42360242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tian Jiang, Xiaoping Liu, Chao Zhang, Chuanhao Yin, Huizhou Liu
Abstract Purpose This article aims to describe the global research profile and the development trends of single cell research from the perspective of bibliometric analysis and semantic mining. Design/methodology/approach The literatures on single cell research were extracted from Clarivate Analytic's Web of Science Core Collection between 2009 and 2019. Firstly, bibliometric analyses were performed with Thomson Data Analyzer (TDA). Secondly, topic identification and evolution trends of single cell research was conducted through the LDA topic model. Thirdly, taking the post-discretized method which is used for topic evolution analysis for reference, the topics were also be dispersed to countries to detect the spatial distribution. Findings The publication of single cell research shows significantly increasing tendency in the last decade. The topics of single cell research field can be divided into three categories, which respectively refers to single cell research methods, mechanism of biological process, and clinical application of single cell technologies. The different trends of these categories indicate that technological innovation drives the development of applied research. The continuous and rapid growth of the topic strength in the field of cancer diagnosis and treatment indicates that this research topic has received extensive attention in recent years. The topic distributions of some countries are relatively balanced, while for the other countries, several topics show significant superiority. Research limitations The analyzed data of this study only contain those were included in the Web of Science Core Collection. Practical implications This study provides insights into the research progress regarding single cell field and identifies the most concerned topics which reflect potential opportunities and challenges. The national topic distribution analysis based on the post-discretized analysis method extends topic analysis from time dimension to space dimension. Originality/value This paper combines bibliometric analysis and LDA model to analyze the evolution trends of single cell research field. The method of extending post-discretized analysis from time dimension to space dimension is distinctive and insightful.
摘要目的本文旨在从文献计量分析和语义挖掘的角度描述单细胞研究的全球研究概况和发展趋势。设计/方法论/方法关于单细胞研究的文献摘自2009年至2019年间Clarivate Analytical的Web of Science核心收藏。首先,使用汤姆逊数据分析仪(TDA)进行文献计量学分析。其次,通过LDA主题模型对单细胞研究的主题识别和发展趋势进行了分析。第三,借鉴后离散化方法进行主题演变分析,将主题分散到各个国家进行空间分布检测。研究结果单细胞研究的发表在过去十年中显示出显著的增长趋势。单细胞研究领域的主题可分为三类,分别指单细胞研究方法、生物过程机制和单细胞技术的临床应用。这些类别的不同趋势表明,技术创新推动了应用研究的发展。癌症诊断与治疗领域的课题强度持续快速增长,表明该研究课题近年来受到广泛关注。一些国家的主题分布相对均衡,而另一些国家的一些主题则显示出显著的优势。研究局限性本研究的分析数据仅包含科学网核心收藏中包含的数据。实际意义本研究深入了解了单细胞领域的研究进展,并确定了反映潜在机遇和挑战的最受关注的主题。基于后离散化分析方法的国家主题分布分析将主题分析从时间维度扩展到空间维度。原创性/价值本文结合文献计量分析和LDA模型,分析了单细胞研究领域的发展趋势。将后离散化分析从时间维度扩展到空间维度的方法是独特而有见地的。
{"title":"Overview of Trends in Global Single Cell Research Based on Bibliometric Analysis and LDA Model (2009–2019)","authors":"Tian Jiang, Xiaoping Liu, Chao Zhang, Chuanhao Yin, Huizhou Liu","doi":"10.2478/jdis-2021-0008","DOIUrl":"https://doi.org/10.2478/jdis-2021-0008","url":null,"abstract":"Abstract Purpose This article aims to describe the global research profile and the development trends of single cell research from the perspective of bibliometric analysis and semantic mining. Design/methodology/approach The literatures on single cell research were extracted from Clarivate Analytic's Web of Science Core Collection between 2009 and 2019. Firstly, bibliometric analyses were performed with Thomson Data Analyzer (TDA). Secondly, topic identification and evolution trends of single cell research was conducted through the LDA topic model. Thirdly, taking the post-discretized method which is used for topic evolution analysis for reference, the topics were also be dispersed to countries to detect the spatial distribution. Findings The publication of single cell research shows significantly increasing tendency in the last decade. The topics of single cell research field can be divided into three categories, which respectively refers to single cell research methods, mechanism of biological process, and clinical application of single cell technologies. The different trends of these categories indicate that technological innovation drives the development of applied research. The continuous and rapid growth of the topic strength in the field of cancer diagnosis and treatment indicates that this research topic has received extensive attention in recent years. The topic distributions of some countries are relatively balanced, while for the other countries, several topics show significant superiority. Research limitations The analyzed data of this study only contain those were included in the Web of Science Core Collection. Practical implications This study provides insights into the research progress regarding single cell field and identifies the most concerned topics which reflect potential opportunities and challenges. The national topic distribution analysis based on the post-discretized analysis method extends topic analysis from time dimension to space dimension. Originality/value This paper combines bibliometric analysis and LDA model to analyze the evolution trends of single cell research field. The method of extending post-discretized analysis from time dimension to space dimension is distinctive and insightful.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"163 - 178"},"PeriodicalIF":0.0,"publicationDate":"2020-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44590754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. L. Schaefer, J. Siluk, Ismael Cristofer Baierle, Elpídio Oscar Benitez Nara
Abstract Purpose This paper aims to point out the scientific development and research density of renewable energy sources such as photovoltaic, wind, and biomass, using a mix of computational tools. Based on this, it was possible to verify the existence of new research trends and opportunities in a macro view regarding management, performance evaluation, and decision-making in renewable energy generation systems and installations. Design/methodology/approach A scientometric approach was used based on a research protocol to retrieve papers from the Scopus database, and through four scientometric questions, to analyze each area. Software such as the Science Mapping Analysis Software Tool (SciMAT) and Sci2 Tool were used to map the science development and density. Findings The scientific development of renewable energy areas is highlighted, pointing out research opportunities regarding management, studies on costs and investments, systemic diagnosis, and performance evaluation for decision-making in businesses in these areas. Research limitations This paper was limited to the articles indexed in the Scopus database and by the questions used to analyze the scientific development of renewable energy areas. Practical implications The results show the need for a managerial perspective in businesses related to renewable energy sources at the managerial, technical, and operational levels, including performance evaluation, assertive decision making, and adequate use of technical and financial resources. Originality/value This paper shows that there is a research field to be explored, with gaps to fill and further research to be carried out in this area. Besides, this paper can serve as a basis for other studies and research in other areas and domains.
{"title":"A Scientometric Approach to Analyze Scientific Development on Renewable Energy Sources","authors":"J. L. Schaefer, J. Siluk, Ismael Cristofer Baierle, Elpídio Oscar Benitez Nara","doi":"10.2478/jdis-2021-0009","DOIUrl":"https://doi.org/10.2478/jdis-2021-0009","url":null,"abstract":"Abstract Purpose This paper aims to point out the scientific development and research density of renewable energy sources such as photovoltaic, wind, and biomass, using a mix of computational tools. Based on this, it was possible to verify the existence of new research trends and opportunities in a macro view regarding management, performance evaluation, and decision-making in renewable energy generation systems and installations. Design/methodology/approach A scientometric approach was used based on a research protocol to retrieve papers from the Scopus database, and through four scientometric questions, to analyze each area. Software such as the Science Mapping Analysis Software Tool (SciMAT) and Sci2 Tool were used to map the science development and density. Findings The scientific development of renewable energy areas is highlighted, pointing out research opportunities regarding management, studies on costs and investments, systemic diagnosis, and performance evaluation for decision-making in businesses in these areas. Research limitations This paper was limited to the articles indexed in the Scopus database and by the questions used to analyze the scientific development of renewable energy areas. Practical implications The results show the need for a managerial perspective in businesses related to renewable energy sources at the managerial, technical, and operational levels, including performance evaluation, assertive decision making, and adequate use of technical and financial resources. Originality/value This paper shows that there is a research field to be explored, with gaps to fill and further research to be carried out in this area. Besides, this paper can serve as a basis for other studies and research in other areas and domains.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"87 - 119"},"PeriodicalIF":0.0,"publicationDate":"2020-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49093679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Purpose Building on Leydesdorff, Bornmann, and Mingers (2019), we elaborate the differences between Tsinghua and Zhejiang University as an empirical example. We address the question of whether differences are statistically significant in the rankings of Chinese universities. We propose methods for measuring statistical significance among different universities within or among countries. Design/methodology/approach Based on z-testing and overlapping confidence intervals, and using data about 205 Chinese universities included in the Leiden Rankings 2020, we argue that three main groups of Chinese research universities can be distinguished (low, middle, and high). Findings When the sample of 205 Chinese universities is merged with the 197 US universities included in Leiden Rankings 2020, the results similarly indicate three main groups: low, middle, and high. Using this data (Leiden Rankings and Web of Science), the z-scores of the Chinese universities are significantly below those of the US universities albeit with some overlap. Research limitations We show empirically that differences in ranking may be due to changes in the data, the models, or the modeling effects on the data. The scientometric groupings are not always stable when we use different methods. Practical implications Differences among universities can be tested for their statistical significance. The statistics relativize the values of decimals in the rankings. One can operate with a scheme of low/middle/high in policy debates and leave the more fine-grained rankings of individual universities to operational management and local settings. Originality/value In the discussion about the rankings of universities, the question of whether differences are statistically significant, has, in our opinion, insufficiently been addressed in research evaluations.
{"title":"Are University Rankings Statistically Significant? A Comparison among Chinese Universities and with the USA","authors":"L. Leydesdorff, C. Wagner, Lin Zhang","doi":"10.2139/ssrn.3731776","DOIUrl":"https://doi.org/10.2139/ssrn.3731776","url":null,"abstract":"Abstract Purpose Building on Leydesdorff, Bornmann, and Mingers (2019), we elaborate the differences between Tsinghua and Zhejiang University as an empirical example. We address the question of whether differences are statistically significant in the rankings of Chinese universities. We propose methods for measuring statistical significance among different universities within or among countries. Design/methodology/approach Based on z-testing and overlapping confidence intervals, and using data about 205 Chinese universities included in the Leiden Rankings 2020, we argue that three main groups of Chinese research universities can be distinguished (low, middle, and high). Findings When the sample of 205 Chinese universities is merged with the 197 US universities included in Leiden Rankings 2020, the results similarly indicate three main groups: low, middle, and high. Using this data (Leiden Rankings and Web of Science), the z-scores of the Chinese universities are significantly below those of the US universities albeit with some overlap. Research limitations We show empirically that differences in ranking may be due to changes in the data, the models, or the modeling effects on the data. The scientometric groupings are not always stable when we use different methods. Practical implications Differences among universities can be tested for their statistical significance. The statistics relativize the values of decimals in the rankings. One can operate with a scheme of low/middle/high in policy debates and leave the more fine-grained rankings of individual universities to operational management and local settings. Originality/value In the discussion about the rankings of universities, the question of whether differences are statistically significant, has, in our opinion, insufficiently been addressed in research evaluations.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"67 - 95"},"PeriodicalIF":0.0,"publicationDate":"2020-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41844206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gianpiero Bianchi, R. Bruni, C. Daraio, A. Palma, G. Perani, Francesco Scalfati
Abstract Purpose The main objective of this work is to show the potentialities of recently developed approaches for automatic knowledge extraction directly from the universities’ websites. The information automatically extracted can be potentially updated with a frequency higher than once per year, and be safe from manipulations or misinterpretations. Moreover, this approach allows us flexibility in collecting indicators about the efficiency of universities’ websites and their effectiveness in disseminating key contents. These new indicators can complement traditional indicators of scientific research (e.g. number of articles and number of citations) and teaching (e.g. number of students and graduates) by introducing further dimensions to allow new insights for “profiling” the analyzed universities. Design/methodology/approach Webometrics relies on web mining methods and techniques to perform quantitative analyses of the web. This study implements an advanced application of the webometric approach, exploiting all the three categories of web mining: web content mining; web structure mining; web usage mining. The information to compute our indicators has been extracted from the universities’ websites by using web scraping and text mining techniques. The scraped information has been stored in a NoSQL DB according to a semi-structured form to allow for retrieving information efficiently by text mining techniques. This provides increased flexibility in the design of new indicators, opening the door to new types of analyses. Some data have also been collected by means of batch interrogations of search engines (Bing, www.bing.com) or from a leading provider of Web analytics (SimilarWeb, http://www.similarweb.com). The information extracted from the Web has been combined with the University structural information taken from the European Tertiary Education Register (https://eter.joanneum.at/#/home), a database collecting information on Higher Education Institutions (HEIs) at European level. All the above was used to perform a clusterization of 79 Italian universities based on structural and digital indicators. Findings The main findings of this study concern the evaluation of the potential in digitalization of universities, in particular by presenting techniques for the automatic extraction of information from the web to build indicators of quality and impact of universities’ websites. These indicators can complement traditional indicators and can be used to identify groups of universities with common features using clustering techniques working with the above indicators. Research limitations The results reported in this study refers to Italian universities only, but the approach could be extended to other university systems abroad. Practical implications The approach proposed in this study and its illustration on Italian universities show the usefulness of recently introduced automatic data extraction and web scraping approaches and its practical relevan
{"title":"Exploring the Potentialities of Automatic Extraction of University Webometric Information","authors":"Gianpiero Bianchi, R. Bruni, C. Daraio, A. Palma, G. Perani, Francesco Scalfati","doi":"10.2478/jdis-2020-0040","DOIUrl":"https://doi.org/10.2478/jdis-2020-0040","url":null,"abstract":"Abstract Purpose The main objective of this work is to show the potentialities of recently developed approaches for automatic knowledge extraction directly from the universities’ websites. The information automatically extracted can be potentially updated with a frequency higher than once per year, and be safe from manipulations or misinterpretations. Moreover, this approach allows us flexibility in collecting indicators about the efficiency of universities’ websites and their effectiveness in disseminating key contents. These new indicators can complement traditional indicators of scientific research (e.g. number of articles and number of citations) and teaching (e.g. number of students and graduates) by introducing further dimensions to allow new insights for “profiling” the analyzed universities. Design/methodology/approach Webometrics relies on web mining methods and techniques to perform quantitative analyses of the web. This study implements an advanced application of the webometric approach, exploiting all the three categories of web mining: web content mining; web structure mining; web usage mining. The information to compute our indicators has been extracted from the universities’ websites by using web scraping and text mining techniques. The scraped information has been stored in a NoSQL DB according to a semi-structured form to allow for retrieving information efficiently by text mining techniques. This provides increased flexibility in the design of new indicators, opening the door to new types of analyses. Some data have also been collected by means of batch interrogations of search engines (Bing, www.bing.com) or from a leading provider of Web analytics (SimilarWeb, http://www.similarweb.com). The information extracted from the Web has been combined with the University structural information taken from the European Tertiary Education Register (https://eter.joanneum.at/#/home), a database collecting information on Higher Education Institutions (HEIs) at European level. All the above was used to perform a clusterization of 79 Italian universities based on structural and digital indicators. Findings The main findings of this study concern the evaluation of the potential in digitalization of universities, in particular by presenting techniques for the automatic extraction of information from the web to build indicators of quality and impact of universities’ websites. These indicators can complement traditional indicators and can be used to identify groups of universities with common features using clustering techniques working with the above indicators. Research limitations The results reported in this study refers to Italian universities only, but the approach could be extended to other university systems abroad. Practical implications The approach proposed in this study and its illustration on Italian universities show the usefulness of recently introduced automatic data extraction and web scraping approaches and its practical relevan","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"5 1","pages":"43 - 55"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41568211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}