Andrea Ceroni, Mihai Georgescu, U. Gadiraju, Kaweh Djafari Naini, M. Fisichella
The Web of data is constantly evolving based on the dynamics of its content. Current Web search engine technologies consider static collections and do not factor in explicitly or implicitly available temporal information, that can be leveraged to gain insights into the dynamics of the data. In this paper, we hypothesize that by employing the temporal aspect as the primary means for capturing the evolution of entities, it is possible to provide entity-based accessibility to Web archives. We empirically show that the edit activity on Wikipedia can be exploited to provide evidence of the evolution of Wikipedia pages over time, both in terms of their content and in terms of their temporally defined relationships, classified in literature as events. Finally, we present results from our extensive analysis of a dataset consisting of 31,998 Wikipedia pages describing politicians, and observations from in-depth case studies. Our findings reflect the usefulness of leveraging temporal information in order to study the evolution of entities and breed promising grounds for further research.
{"title":"Information Evolution in Wikipedia","authors":"Andrea Ceroni, Mihai Georgescu, U. Gadiraju, Kaweh Djafari Naini, M. Fisichella","doi":"10.1145/2641580.2641612","DOIUrl":"https://doi.org/10.1145/2641580.2641612","url":null,"abstract":"The Web of data is constantly evolving based on the dynamics of its content. Current Web search engine technologies consider static collections and do not factor in explicitly or implicitly available temporal information, that can be leveraged to gain insights into the dynamics of the data. In this paper, we hypothesize that by employing the temporal aspect as the primary means for capturing the evolution of entities, it is possible to provide entity-based accessibility to Web archives. We empirically show that the edit activity on Wikipedia can be exploited to provide evidence of the evolution of Wikipedia pages over time, both in terms of their content and in terms of their temporally defined relationships, classified in literature as events. Finally, we present results from our extensive analysis of a dataset consisting of 31,998 Wikipedia pages describing politicians, and observations from in-depth case studies. Our findings reflect the usefulness of leveraging temporal information in order to study the evolution of entities and breed promising grounds for further research.","PeriodicalId":447989,"journal":{"name":"Proceedings of The International Symposium on Open Collaboration","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125123261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A semantic wiki is a wiki that has a model of the knowledge contained in its pages. Currently, semantic wikis are not adopted by a large user base, because most implementations are research prototypes that implement their own wiki engine. To increase familiarity with semantic wikis and quick adoption of semantic technologies we present Strata, a plugin for the well known wiki DokuWiki. Strata allows the use of semi-structured data in any DokuWiki installation, normalizes values based on their types, and allows extensive data modeling and querying on complex data structures.
{"title":"Strata: Typed Semi-Structured Data in DokuWiki","authors":"Brend Wanders, Steven te Brinke","doi":"10.1145/2641580.2641636","DOIUrl":"https://doi.org/10.1145/2641580.2641636","url":null,"abstract":"A semantic wiki is a wiki that has a model of the knowledge contained in its pages. Currently, semantic wikis are not adopted by a large user base, because most implementations are research prototypes that implement their own wiki engine. To increase familiarity with semantic wikis and quick adoption of semantic technologies we present Strata, a plugin for the well known wiki DokuWiki. Strata allows the use of semi-structured data in any DokuWiki installation, normalizes values based on their types, and allows extensive data modeling and querying on complex data structures.","PeriodicalId":447989,"journal":{"name":"Proceedings of The International Symposium on Open Collaboration","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126398162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wikipedia needs to attract and retain newcomers while also increasing the quality of its content. Yet new Wikipedia users are disproportionately affected by the quality assurance mechanisms designed to thwart spammers and promoters. English Wikipedia's Articles for Creation provides a protected space for drafting new articles, which are reviewed against minimum quality guidelines before they are published. In this study we explore how this drafting process has affected the productivity of newcomers in Wikipedia. Using a mixed qualitative and quantitative approach, we show how the process's pre-publication review, which is intended to improve the success of newcomers, in fact decreases newcomer productivity in English Wikipedia and offer recommendations for system designers.
{"title":"Accept, decline, postpone: How newcomer productivity is reduced in English Wikipedia by pre-publication review","authors":"Jodi Schneider, Bluma S. Gelley, Aaron L Halfaker","doi":"10.1145/2641580.2641614","DOIUrl":"https://doi.org/10.1145/2641580.2641614","url":null,"abstract":"Wikipedia needs to attract and retain newcomers while also increasing the quality of its content. Yet new Wikipedia users are disproportionately affected by the quality assurance mechanisms designed to thwart spammers and promoters. English Wikipedia's Articles for Creation provides a protected space for drafting new articles, which are reviewed against minimum quality guidelines before they are published. In this study we explore how this drafting process has affected the productivity of newcomers in Wikipedia. Using a mixed qualitative and quantitative approach, we show how the process's pre-publication review, which is intended to improve the success of newcomers, in fact decreases newcomer productivity in English Wikipedia and offer recommendations for system designers.","PeriodicalId":447989,"journal":{"name":"Proceedings of The International Symposium on Open Collaboration","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127452172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The role of open data in air transport research is analyzed by means of a sample of over 300 research articles. The most used (or available) data types, their sources and their access policies are identified, both for the US and the EU. The analyses show that 70% of research in air transport is heavily reliant on data, that 70% of the data sources are curated by governmental bodies and that the US publicizes a wider set of sources, leading to wider usage. Areas for improving accessibility of (mainly European) data sources are outlined and alternative avenues to obtain data are sketched. The fact that Europe is lagging considerably in making its sources readily available to the research community means Europe missing out on entrepreneurship, innovation and scientific discovery, the presumed benefits of open data.
{"title":"Open Data for Air Transport Research: Dream or Reality?","authors":"M. Bourgois, Michael Sfyroeras","doi":"10.1145/2641580.2641602","DOIUrl":"https://doi.org/10.1145/2641580.2641602","url":null,"abstract":"The role of open data in air transport research is analyzed by means of a sample of over 300 research articles. The most used (or available) data types, their sources and their access policies are identified, both for the US and the EU. The analyses show that 70% of research in air transport is heavily reliant on data, that 70% of the data sources are curated by governmental bodies and that the US publicizes a wider set of sources, leading to wider usage. Areas for improving accessibility of (mainly European) data sources are outlined and alternative avenues to obtain data are sketched. The fact that Europe is lagging considerably in making its sources readily available to the research community means Europe missing out on entrepreneurship, innovation and scientific discovery, the presumed benefits of open data.","PeriodicalId":447989,"journal":{"name":"Proceedings of The International Symposium on Open Collaboration","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121739499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
User generated content in Wikis is mainly distributed on the article view and its corresponding talk page. Potentials of analysing and supporting discussants' knowledge construction processes on the level of talk pages have still been rarely researched. The presented experimental study addresses this issue by providing external representations of content-related controversies which were led by contradictory evidence between discussants to foster awareness on socio-cognitive conflicts which can be beneficial for learning. Its aim is to investigate how increased salience of controversies can guide participants' (N = 81) navigation and learning processes. Three conditions differing in their degree of awareness support were implemented in this study. Results indicate that the implementation of awareness representations helped students to focus on meaningful discussion threads. Findings suggest that Wiki talk page users can benefit from additional structuring aids.
{"title":"Supporting awareness of content-related controversies in a Wiki-based learning environment","authors":"Sven Heimbuch, Daniel Bodemer","doi":"10.1145/2641580.2641607","DOIUrl":"https://doi.org/10.1145/2641580.2641607","url":null,"abstract":"User generated content in Wikis is mainly distributed on the article view and its corresponding talk page. Potentials of analysing and supporting discussants' knowledge construction processes on the level of talk pages have still been rarely researched. The presented experimental study addresses this issue by providing external representations of content-related controversies which were led by contradictory evidence between discussants to foster awareness on socio-cognitive conflicts which can be beneficial for learning. Its aim is to investigate how increased salience of controversies can guide participants' (N = 81) navigation and learning processes. Three conditions differing in their degree of awareness support were implemented in this study. Results indicate that the implementation of awareness representations helped students to focus on meaningful discussion threads. Findings suggest that Wiki talk page users can benefit from additional structuring aids.","PeriodicalId":447989,"journal":{"name":"Proceedings of The International Symposium on Open Collaboration","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114691317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erik Borra, Esther Weltevrede, P. Ciuccarelli, Andreas Kaltenbrunner, David Laniado, Giovanni Magni, Michele Mauri, Richard A. Rogers, T. Venturini
Collaborative content creation inevitably reaches situations where different points of view lead to conflict. In Wikipedia, one of the most prominent examples of collaboration online, conflict is mediated by both policy and software, and conflicts often reflect larger societal debates. Contropedia is a platform for the analysis and visualization of such controversies in Wikipedia. Controversy metrics are extracted from activity streams generated by edits to, and discussions about, individual articles and groups of related articles. An article's revision history and its corresponding discussion pages constitute two parallel streams of user interactions that, taken together, fully describe the process of the collaborative creation of an article. Our proposed platform, Contropedia, builds on state of the art techniques and extends current metrics for the analysis of both edit and discussion activity and visualizes these both as a layer on top of Wikipedia articles as well as a dashboard view presenting additional analytics. Furthermore, the combination of these two approaches allows for a deeper understanding of the substance, composition, actor alignment, trajectory and liveliness of controversies on Wikipedia. Our research aims to provide a better understanding of socio-technical phenomena that take place on the web and to equip citizens with tools to fully deploy the complexity of controversies. Contropedia is useful for the general public as well as user groups with specific interests such as scientists, students, data journalists, decision makers and media communicators. Contropedia can be found at http://contropedia.net.
{"title":"Contropedia - the analysis and visualization of controversies in Wikipedia articles","authors":"Erik Borra, Esther Weltevrede, P. Ciuccarelli, Andreas Kaltenbrunner, David Laniado, Giovanni Magni, Michele Mauri, Richard A. Rogers, T. Venturini","doi":"10.1145/2641580.2641622","DOIUrl":"https://doi.org/10.1145/2641580.2641622","url":null,"abstract":"Collaborative content creation inevitably reaches situations where different points of view lead to conflict. In Wikipedia, one of the most prominent examples of collaboration online, conflict is mediated by both policy and software, and conflicts often reflect larger societal debates. Contropedia is a platform for the analysis and visualization of such controversies in Wikipedia. Controversy metrics are extracted from activity streams generated by edits to, and discussions about, individual articles and groups of related articles. An article's revision history and its corresponding discussion pages constitute two parallel streams of user interactions that, taken together, fully describe the process of the collaborative creation of an article. Our proposed platform, Contropedia, builds on state of the art techniques and extends current metrics for the analysis of both edit and discussion activity and visualizes these both as a layer on top of Wikipedia articles as well as a dashboard view presenting additional analytics. Furthermore, the combination of these two approaches allows for a deeper understanding of the substance, composition, actor alignment, trajectory and liveliness of controversies on Wikipedia. Our research aims to provide a better understanding of socio-technical phenomena that take place on the web and to equip citizens with tools to fully deploy the complexity of controversies. Contropedia is useful for the general public as well as user groups with specific interests such as scientists, students, data journalists, decision makers and media communicators. Contropedia can be found at http://contropedia.net.","PeriodicalId":447989,"journal":{"name":"Proceedings of The International Symposium on Open Collaboration","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114314734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The nature of software development has changed significantly over the last decade or so, driven by trends such as an increasing level of software outsourcing, distributed development and collaborative development models. One such model of collaborative and distributed development that has attracted significant attention in both industry and research communities is that of Open Source. Open Source development seems to defy traditional wisdom in software development --- with a seeming absence of a predefined process, open source communities have produced high-quality and successful products. Increasingly, large organizations are looking to reproduce such emerging and collaborative development projects by adopting the open source development paradigm within their organizations. This phenomenon is labelled "Inner Source". This talk will present the results of four years of research into Inner Source. Specifically, the talk will address questions such as why companies would want to adopt Inner Source and what factors are important when adopting Inner Source. The talk will draw from several industry cases of Inner Source.
{"title":"Inner Source: Coming to a Company Near You Soon!","authors":"Klaas-Jan Stol","doi":"10.1145/2641580.2641584","DOIUrl":"https://doi.org/10.1145/2641580.2641584","url":null,"abstract":"The nature of software development has changed significantly over the last decade or so, driven by trends such as an increasing level of software outsourcing, distributed development and collaborative development models. One such model of collaborative and distributed development that has attracted significant attention in both industry and research communities is that of Open Source. Open Source development seems to defy traditional wisdom in software development --- with a seeming absence of a predefined process, open source communities have produced high-quality and successful products. Increasingly, large organizations are looking to reproduce such emerging and collaborative development projects by adopting the open source development paradigm within their organizations. This phenomenon is labelled \"Inner Source\". This talk will present the results of four years of research into Inner Source. Specifically, the talk will address questions such as why companies would want to adopt Inner Source and what factors are important when adopting Inner Source. The talk will draw from several industry cases of Inner Source.","PeriodicalId":447989,"journal":{"name":"Proceedings of The International Symposium on Open Collaboration","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129740388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Over the last two years we have been developing Wikidata and build up a community around it. Wikidata is Wikimedia's central repository for structured data. This is the place where data, like the number of inhabitants of a country, is stored and made accessible to humans and computers alike. The data is used across all 287 language editions of Wikipedia and its sister projects as well as in projects outside of Wikimedia. In this talk we will take a look at how we developed Wikidata, what great tools are being built on top of it and what is in store for the future.
{"title":"Wikidata: How We Brought Structured Data to Wikipedia","authors":"D. Kinzler, Lydia Pintscher","doi":"10.1145/2641580.2641583","DOIUrl":"https://doi.org/10.1145/2641580.2641583","url":null,"abstract":"Over the last two years we have been developing Wikidata and build up a community around it. Wikidata is Wikimedia's central repository for structured data. This is the place where data, like the number of inhabitants of a country, is stored and made accessible to humans and computers alike. The data is used across all 287 language editions of Wikipedia and its sister projects as well as in projects outside of Wikimedia. In this talk we will take a look at how we developed Wikidata, what great tools are being built on top of it and what is in store for the future.","PeriodicalId":447989,"journal":{"name":"Proceedings of The International Symposium on Open Collaboration","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133419530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Around the world national and municipal governments launch open data initiatives with declared goals like increased efficiency, transparency or economic growth. However, although little of these effects have been proven, more and more administrations open up their datasets to the public. The dissertation project describes this phenomenon as the ongoing institutionalization of digital openness in the field of public sector information. With empirical evidence from three case studies in large European cities the research project intends to theorize how NGOs, hackers and certain civil servants turn open data into an institution, which more and more public bodies feel the need to adapt to.
{"title":"\"The Institutionalization of Digital Openness\": How NGOs, Hackers and Civil Servants Organize Municipal Open Data Ecosystems","authors":"Maximilian Heimstädt","doi":"10.1145/2641580.2641626","DOIUrl":"https://doi.org/10.1145/2641580.2641626","url":null,"abstract":"Around the world national and municipal governments launch open data initiatives with declared goals like increased efficiency, transparency or economic growth. However, although little of these effects have been proven, more and more administrations open up their datasets to the public. The dissertation project describes this phenomenon as the ongoing institutionalization of digital openness in the field of public sector information. With empirical evidence from three case studies in large European cities the research project intends to theorize how NGOs, hackers and certain civil servants turn open data into an institution, which more and more public bodies feel the need to adapt to.","PeriodicalId":447989,"journal":{"name":"Proceedings of The International Symposium on Open Collaboration","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132375119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a method of geo-linguistic normalization to advance the existing comparative analysis of open collaborative communities, with multilingual Wikipedia projects as the example. Such normalization requires data regarding the potential users and/or resources of a geolinguistic unit.
{"title":"Geographic and linguistic normalization: towards a better understanding of the geolinguistic dynamics of knowledge","authors":"H. Liao, T. Petzold","doi":"10.1145/2641580.2641623","DOIUrl":"https://doi.org/10.1145/2641580.2641623","url":null,"abstract":"This paper proposes a method of geo-linguistic normalization to advance the existing comparative analysis of open collaborative communities, with multilingual Wikipedia projects as the example. Such normalization requires data regarding the potential users and/or resources of a geolinguistic unit.","PeriodicalId":447989,"journal":{"name":"Proceedings of The International Symposium on Open Collaboration","volume":"289 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133821457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}