{"title":"Keep a Human in the Machine and Other Lessons Learned from Deploying and Maintaining Colandr","authors":"Samantha Cheng, C. Augustin","doi":"10.1080/09332480.2021.1979818","DOIUrl":null,"url":null,"abstract":"The rise of the “invest in open” movement has led to increased focus on - and recognition of the positive benefit of - open source toolkits as a way of democratizing access to software that is typically expensive and therefore restricted to research institutions. From 2015-2017, DataKind partnered with researchers from NCEAS through the SNAPP Consortium to tackle a problem common across many sectors: how to digest the amount of evidence available for decision-making in a way that would still allow for timely decisions to be made. Realizing that the evidence was stored primarily in PDFs and that the rise of machine learning techniques such as natural language processing meant that thousands of words from PDFs could be processed on local computers, the research team took an approach of building an open source tool to compete with commercially available evidence synthesis tools. With an algorithmic backend that relies on word vectorization, this project is an example of technology use to aid common labor intensive researcher tasks. From inception to early maintenance, this project produced many valuable lessons regarding the launch and stewardship of a public good and this article is a reflection of the learnings across that process.","PeriodicalId":88226,"journal":{"name":"Chance (New York, N.Y.)","volume":"20 1","pages":"56 - 60"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chance (New York, N.Y.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/09332480.2021.1979818","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The rise of the “invest in open” movement has led to increased focus on - and recognition of the positive benefit of - open source toolkits as a way of democratizing access to software that is typically expensive and therefore restricted to research institutions. From 2015-2017, DataKind partnered with researchers from NCEAS through the SNAPP Consortium to tackle a problem common across many sectors: how to digest the amount of evidence available for decision-making in a way that would still allow for timely decisions to be made. Realizing that the evidence was stored primarily in PDFs and that the rise of machine learning techniques such as natural language processing meant that thousands of words from PDFs could be processed on local computers, the research team took an approach of building an open source tool to compete with commercially available evidence synthesis tools. With an algorithmic backend that relies on word vectorization, this project is an example of technology use to aid common labor intensive researcher tasks. From inception to early maintenance, this project produced many valuable lessons regarding the launch and stewardship of a public good and this article is a reflection of the learnings across that process.