The LibGuides platform is a ubiquitous tool in academic libraries and is commonly used by librarians to compile and share lists of recommended social science numerical data resources with users. This study leverages the machine-accessible nature of the LibGuides platform to collect links to data and statistical resources from over 10,000 LibGuide pages at 123 R1 research institutions. After substantial data cleaning and normalization, an analysis of the most common resources on those guides provides a unique window into the data repositories, libraries, archives, statistical data platforms, and other machine-readable data sources that are most popular on academic library guides. Results show that freely available resources from U.S. government agencies are among the most common to be included on data and statistical resources guides across institutions. Resources requiring paid licenses or memberships for full access, such as Statistical Insight (ProQuest), Social Explorer, and ICPSR are linked to most frequently overall, regardless of the percentage of institutions that include them. Findings also suggest that libraries are more likely to share traditional licensed statistical resources (e.g., Cambridge’s Historical Statistics of the United States) and collections of simple charts and graphs (e.g., Statista) than more robust and complex microdata resources (e.g., IPUMS).
{"title":"Taking count: A computational analysis of data resources on academic LibGuides","authors":"C. Hennesy, Alicia Kubas, J. McBurney","doi":"10.29173/iq1040","DOIUrl":"https://doi.org/10.29173/iq1040","url":null,"abstract":"The LibGuides platform is a ubiquitous tool in academic libraries and is commonly used by librarians to compile and share lists of recommended social science numerical data resources with users. This study leverages the machine-accessible nature of the LibGuides platform to collect links to data and statistical resources from over 10,000 LibGuide pages at 123 R1 research institutions. After substantial data cleaning and normalization, an analysis of the most common resources on those guides provides a unique window into the data repositories, libraries, archives, statistical data platforms, and other machine-readable data sources that are most popular on academic library guides. Results show that freely available resources from U.S. government agencies are among the most common to be included on data and statistical resources guides across institutions. Resources requiring paid licenses or memberships for full access, such as Statistical Insight (ProQuest), Social Explorer, and ICPSR are linked to most frequently overall, regardless of the percentage of institutions that include them. Findings also suggest that libraries are more likely to share traditional licensed statistical resources (e.g., Cambridge’s Historical Statistics of the United States) and collections of simple charts and graphs (e.g., Statista) than more robust and complex microdata resources (e.g., IPUMS).","PeriodicalId":84870,"journal":{"name":"IASSIST quarterly","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42346586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas Hertzog, Jenny Chen-Charles, Camille Wittesaele, K. de Graaf, Ray Titus, Jane-Frances Kelly, N. Langwenya, L. Baerecke, B. Banougnin, W. Saal, John Southall, L. Cluver, E. Toska
Recent data protection regulatory frameworks, such as the Protection of Personal Information Act (POPI Act) in South Africa and the General Data Protection Regulation (GDPR) in the European Union, impose governance requirements for research involving high-risk and vulnerable groups such as children and adolescents. Our paper's objective is to unpack what constitutes adequate safeguards to protect the personal information of vulnerable populations such as children and adolescents. We suggest strategies to adhere meaningfully to the principal aims of data protection regulations. Navigating this within established research projects raises questions about how to interpret regulatory frameworks to build on existing mechanisms already used by researchers. Therefore, we will explore a series of best practices in safeguarding the personal information of children, adolescents and young people (0-24 years old), who represent more than half of sub-Saharan Africa's population. We discuss the actions taken by the research group to ensure regulations such as GDPR and POPIA effectively build on existing data protection mechanisms for research projects at all stages, focusing on promoting regulatory alignment throughout the data lifecycle. Our goal is to stimulate a broader conversation on improving the protection of sensitive personal information of children, adolescents and young people in sub-Saharan Africa. We join this discussion as a research group generating evidence influencing social and health policy and programming for young people in sub-Saharan Africa. Our contribution draws on our work adhering to multiple transnational governance frameworks imposed by national legislation, such as data protection regulations, funders, and academic institutions.
{"title":"Data management instruments to protect the personal information of children and adolescents in sub-Saharan Africa","authors":"Lucas Hertzog, Jenny Chen-Charles, Camille Wittesaele, K. de Graaf, Ray Titus, Jane-Frances Kelly, N. Langwenya, L. Baerecke, B. Banougnin, W. Saal, John Southall, L. Cluver, E. Toska","doi":"10.29173/iq1044","DOIUrl":"https://doi.org/10.29173/iq1044","url":null,"abstract":"Recent data protection regulatory frameworks, such as the Protection of Personal Information Act (POPI Act) in South Africa and the General Data Protection Regulation (GDPR) in the European Union, impose governance requirements for research involving high-risk and vulnerable groups such as children and adolescents. Our paper's objective is to unpack what constitutes adequate safeguards to protect the personal information of vulnerable populations such as children and adolescents. We suggest strategies to adhere meaningfully to the principal aims of data protection regulations. Navigating this within established research projects raises questions about how to interpret regulatory frameworks to build on existing mechanisms already used by researchers. Therefore, we will explore a series of best practices in safeguarding the personal information of children, adolescents and young people (0-24 years old), who represent more than half of sub-Saharan Africa's population. We discuss the actions taken by the research group to ensure regulations such as GDPR and POPIA effectively build on existing data protection mechanisms for research projects at all stages, focusing on promoting regulatory alignment throughout the data lifecycle. Our goal is to stimulate a broader conversation on improving the protection of sensitive personal information of children, adolescents and young people in sub-Saharan Africa. We join this discussion as a research group generating evidence influencing social and health policy and programming for young people in sub-Saharan Africa. Our contribution draws on our work adhering to multiple transnational governance frameworks imposed by national legislation, such as data protection regulations, funders, and academic institutions.","PeriodicalId":84870,"journal":{"name":"IASSIST quarterly","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44274109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
João Aguiar Castro, Joana Rodrigues, Paula Mena Matos, Célia M D Sales, Cristina Ribeiro
To address metadata with researchers it is important to use models that include familiar domain concepts. In the Social Sciences, the DDI is a well-accepted source of such domain concepts. To create FAIR data and metadata, we need to establish a compact set of DDI elements that fit the requirements in projects and are likely to be adopted by researchers inexperienced with metadata creation. Over time, we have engaged in interviews and data description sessions with research groups in the Social Sciences, identifying a manageable DDI subset. A recent Clinical Psychology project, TOGETHER, dealing with risk assessment for hereditary cancer, considered the inclusion of a DDI subset for the production of metadata that are timely and interoperable with data publication initiatives in the same domain. Taking a DDI subset identified by the data curators, we make a preliminary assessment of its use as a realistic effort on the part of researchers, taking into consideration the metadata created in two data description sessions, the effort involved, and overall metadata quality. A follow-up questionnaire was used to assess the perspectives of researchers regarding data description.
{"title":"Getting in touch with metadata: a DDI subset for FAIR metadata production in clinical psychology","authors":"João Aguiar Castro, Joana Rodrigues, Paula Mena Matos, Célia M D Sales, Cristina Ribeiro","doi":"10.29173/iq1008","DOIUrl":"https://doi.org/10.29173/iq1008","url":null,"abstract":"To address metadata with researchers it is important to use models that include familiar domain concepts. In the Social Sciences, the DDI is a well-accepted source of such domain concepts. To create FAIR data and metadata, we need to establish a compact set of DDI elements that fit the requirements in projects and are likely to be adopted by researchers inexperienced with metadata creation. Over time, we have engaged in interviews and data description sessions with research groups in the Social Sciences, identifying a manageable DDI subset. A recent Clinical Psychology project, TOGETHER, dealing with risk assessment for hereditary cancer, considered the inclusion of a DDI subset for the production of metadata that are timely and interoperable with data publication initiatives in the same domain. Taking a DDI subset identified by the data curators, we make a preliminary assessment of its use as a realistic effort on the part of researchers, taking into consideration the metadata created in two data description sessions, the effort involved, and overall metadata quality. A follow-up questionnaire was used to assess the perspectives of researchers regarding data description.","PeriodicalId":84870,"journal":{"name":"IASSIST quarterly","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44750458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Welcome to the first issue of IASSIST Quarterly for the year 2023 - IQ vol. 47(1). The last article in this issue has in the title the FAIR acronym that stands for Findable, Accessible, Interoperable, and Reusable. These are the concepts most often focused on by our articles in the IQ and FAIR has an extra emphasis in this issue. The first article introduces and demonstrates a shared vocabulary for data points where the need arose after confusions about data and metadata. Basically, I find that the most valuable virtue of well-structured data – I deliberately use a fuzzy term to save you from long excursions here in the editor's notes – is that other well-structured data can benefit from use of the same software. Similarly, well-structured metadata can benefit from the same software. I also see this as the driver for the second article, on time series data and description. Sometimes, the software mentioned is the same software in both instances as metadata is treated as data or vice versa. This allows for new levels of data-driven machine actions. These days universities are busy investigating and discussing the latest chatbots. I find many of the approaches restrictive and prefer to support the inclusive ones. Likewise, I also expect and look forward to bots having great relevance for the future implementation of FAIR principles. The first article is on data and metadata by George Alter, Flavio Rizzolo, and Kathi Schleidt and has the title ‘View points on data points: A shared vocabulary for cross-domain conversations on data and metadata’. The authors have observed that sharing data across scientific domains is often impeded by differences in the language used to describe data and metadata. To avoid confusion, the authors develop a terminology. Part of the confusion concerns disagreement about the boundaries between data and metadata; and that what is metadata in one domain can be data in another. The shift between data and metadata is what they name as ‘semantic transposition’. I find that such shifts are a virtue and a strength and as the authors say, there is no fixed boundary between data and metadata, and both can be acted upon by people and machines. The article draws on and refers to many other standards and developments, most cited are the data model of Observations and Measurements (ISO 19156) and tools of the Data Documentation Initiative’s Cross Domain Integration (DDI-CDI). The article is thorough and explanatory with many examples and diagrams for learning, including examples of transformations between the formats: wide, long, and multidimensional. The long format of entity-attribute-value has the value domain restricted by the attribute, and in examples time and source are added, which demonstrates how further metadata enter the format. When transposing to the wide format, this is a more familiar data matrix where the same value domain applies to the complete column. The multidimensional format with facets is for most reade
欢迎来到第一期的IASSIST季度为今年2023年-智商卷47(1)。本期最后一篇文章的标题是FAIR,即可查找、可访问、可互操作和可重用。这些是我们在IQ和FAIR上的文章中最常关注的概念,在这个问题上有一个额外的强调。第一篇文章介绍并演示了数据点的共享词汇表,在混淆了数据和元数据之后,需要使用这些数据点。基本上,我发现结构良好的数据最有价值的优点——我故意使用一个模糊的术语,以免您在编辑注释中进行冗长的讨论——是其他结构良好的数据可以从使用相同的软件中受益。同样,结构良好的元数据也可以从相同的软件中受益。我也将此视为第二篇文章(关于时间序列数据和描述)的驱动因素。有时,在两种情况下提到的软件是相同的软件,因为元数据被视为数据,反之亦然。这允许数据驱动的机器操作达到新的水平。最近,大学正忙着研究和讨论最新的聊天机器人。我发现许多方法都是限制性的,我更倾向于支持包容性的方法。同样,我也期望并期待机器人与公平原则的未来实施有很大的相关性。第一篇文章是关于数据和元数据的,作者是George Alter、Flavio Rizzolo和Kathi Schleidt,文章的标题是“数据点的观点:数据和元数据跨域对话的共享词汇”。这组作者观察到,跨科学领域的数据共享常常受到用于描述数据和元数据的语言差异的阻碍。为了避免混淆,作者开发了一个术语。部分混乱涉及数据和元数据之间边界的分歧;一个领域的元数据可以是另一个领域的数据。数据和元数据之间的转换被他们称为“语义转换”。我发现这种转变是一种优点,也是一种优势,正如作者所说,数据和元数据之间没有固定的界限,两者都可以被人和机器所操作。本文借鉴并引用了许多其他标准和发展,其中引用最多的是观察和测量的数据模型(ISO 19156)和数据文档计划的跨域集成(DDI-CDI)工具。这篇文章是全面的和解释性的,有许多用于学习的示例和图表,包括格式之间的转换示例:宽、长和多维。实体-属性-值的长格式具有受属性限制的值域,并且在示例中添加了时间和源,这演示了进一步的元数据如何进入该格式。当转置到宽格式时,这是一个更熟悉的数据矩阵,其中相同的值域应用于整个列。对于大多数读者来说,带有facet的多维格式是统计机构发布的熟悉的聚合。作者认为,他们的领域独立词汇表支持跨领域对话。George Alter是密歇根大学社会研究所名誉研究教授,Flavio Rizzolo是加拿大统计局的高级数据科学架构师。Kathi Schleidt是一位数据科学家,也是DataCove的创始人。第一篇文章中的格式讨论也是第二篇关于“美国劳工统计局数据管理现代化”的论文的重点。美国劳工统计局(BLS)关注时间序列,Daniel W. Gillman和Clayton Waring(都来自BLS)将时间序列数据视为三个组成部分的组合:测量元素;人、地、物元素(PPT);还有一个时间元素。在论文中,Gillman和Waring还描述了概念模型(UML)以及系统的设计和特征。首先,他们回顾了20世纪70年代的历史和Codd关系模型,以及2000年后开发和完善的标准。您不会惊讶地发现,在这些参考文献中还有数据文档计划的跨域集成(DDI-CDI)。其使命是:“找到一种简单直观的方式来存储和组织统计数据,目标是使数据易于查找和使用”。采用语义方法,即关注基于“测量/人-地点-事物/时间”模型的数据的含义。详细的例子说明PPT是如何进行维度分类的,例如“护士”在标准职业分类中,“医院”在北美行业分类系统中。和第一篇论文一样,这篇论文也提到了多维结构。美国劳工统计局描述的现代化预计将于2023年初发布。 第三篇论文是由jo<s:1> o Aguiar Castro, Joana Rodrigues, Paula Mena Matos, c<s:1>里亚萨莱斯和克里斯蒂娜里贝罗撰写的,所有作者都隶属于波尔图大学。与前面的文章一样,本文也引用了数据文档计划(DDI),重点关注FAIR首字母缩略词背后的概念:可查找、可访问、可互操作和可重用。题目是:“接触元数据:临床心理学中FAIR元数据生成的DDI子集”。临床心理学并不是IASSIST季刊中经常出现的一个领域,但事实证明,该项目描述始于与社会科学研究小组的访谈和数据描述会议,以确定可管理的DDI子集。该项目还借鉴了TAIL、TOGETHER和Dendro等其他项目。TAIL项目关注研究工作流程中的集成元数据工具,并评估来自不同领域的研究人员的需求。TOGETHER是一个在心理肿瘤学领域和以家庭为中心的遗传性癌症护理的项目。由于大多数研究人员对元数据缺乏经验,他们将注意力集中在DDI子集上,这意味着FAIR元数据可以用于存储。对研究人员的支持是必不可少的,因为他们有领域的专业知识,可以创建非常详细的描述。另一方面,数据管理员可以确保元数据遵循FAIR规则。这是通过在研究工作流程中嵌入Dendro平台实现的,其中元数据的创建是在数据的增量描述中执行的。本文包括用户界面的屏幕截图,显示词汇表的选择。该方法和DDI子集的采用产生了比通常可用的更全面的元数据。IASSIST季刊非常欢迎提交论文。我们欢迎来自IASSIST会议或其他会议和研讨会的意见,来自当地的演讲或专门为IQ编写的论文。当你准备这样的演讲时,考虑一下把你的一次演讲变成一个持久的贡献。事后做这件事也能让你有机会在得到反馈后改进你的工作。我们鼓励您登录或创建一个作者档案https://www.iassistquarterly.com(我们的开放期刊系统应用程序)。我们允许作者有“深度链接”到智商以及沉积的论文在您的本地存储库。主持一次会议或研讨会,目的是为某一期IQ特刊收集和整合论文,这也是非常值得赞赏的,因为这些信息可以传递给更多的人,而不仅仅是有限的会议参与者,而且可以在IASSIST季刊网站https://www.iassistquarterly.com上随时获得。非常欢迎作者看一下说明和布局:https://www.iassistquarterly.com/index.php/iassist/about/submissionsAuthors也可以直接通过电子邮件与我联系:kbr@sam.sdu.dk。如果您有兴趣作为客座编辑为《IQ》编辑一期特刊,我也将很高兴收到您的来信。卡斯滕·博伊·拉斯穆森——2023年3月
{"title":"Editor's notes: FAIR BOT. As metadata is data is metadata is data ...","authors":"K. Rasmussen","doi":"10.29173/iq1086","DOIUrl":"https://doi.org/10.29173/iq1086","url":null,"abstract":"Welcome to the first issue of IASSIST Quarterly for the year 2023 - IQ vol. 47(1). \u0000The last article in this issue has in the title the FAIR acronym that stands for Findable, Accessible, Interoperable, and Reusable. These are the concepts most often focused on by our articles in the IQ and FAIR has an extra emphasis in this issue. The first article introduces and demonstrates a shared vocabulary for data points where the need arose after confusions about data and metadata. Basically, I find that the most valuable virtue of well-structured data – I deliberately use a fuzzy term to save you from long excursions here in the editor's notes – is that other well-structured data can benefit from use of the same software. Similarly, well-structured metadata can benefit from the same software. I also see this as the driver for the second article, on time series data and description. Sometimes, the software mentioned is the same software in both instances as metadata is treated as data or vice versa. This allows for new levels of data-driven machine actions. These days universities are busy investigating and discussing the latest chatbots. I find many of the approaches restrictive and prefer to support the inclusive ones. Likewise, I also expect and look forward to bots having great relevance for the future implementation of FAIR principles. \u0000The first article is on data and metadata by George Alter, Flavio Rizzolo, and Kathi Schleidt and has the title ‘View points on data points: A shared vocabulary for cross-domain conversations on data and metadata’. The authors have observed that sharing data across scientific domains is often impeded by differences in the language used to describe data and metadata. To avoid confusion, the authors develop a terminology. Part of the confusion concerns disagreement about the boundaries between data and metadata; and that what is metadata in one domain can be data in another. The shift between data and metadata is what they name as ‘semantic transposition’. I find that such shifts are a virtue and a strength and as the authors say, there is no fixed boundary between data and metadata, and both can be acted upon by people and machines. The article draws on and refers to many other standards and developments, most cited are the data model of Observations and Measurements (ISO 19156) and tools of the Data Documentation Initiative’s Cross Domain Integration (DDI-CDI). The article is thorough and explanatory with many examples and diagrams for learning, including examples of transformations between the formats: wide, long, and multidimensional. The long format of entity-attribute-value has the value domain restricted by the attribute, and in examples time and source are added, which demonstrates how further metadata enter the format. When transposing to the wide format, this is a more familiar data matrix where the same value domain applies to the complete column. The multidimensional format with facets is for most reade","PeriodicalId":84870,"journal":{"name":"IASSIST quarterly","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69787768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The US Bureau of Labor Statistics (BLS) is undertaking initiatives to improve its data and metadata systems. Planning for the replacement of the public facing LABSTAT data query system and efforts within the Office of Productivity and Technology to combine multiple production systems within a single cross-divisional database platform are examples. BLS views time-series data as a combination of three elemental components found in every time-series. A measure element; a person, places, and things element; and a time element are the components. The authors turned this basic approach into a formal conceptual model represented in UML (Unified Modeling Language). The UML model describes a flexible multi-dimensional data structure, of which time-series are a kind, and supports any kind of query into the data. The Office of Productivity and Technology has adopted the model, and it is guiding their approach moving forward. The model was also adopted by the Financial Industry Business Ontology project under the Object Management Group and by the Data Documentation Initiative Cross-Domain Integration (DDI-CDI) development project. There are other similarities between the OPT effort and DDI-CDI as well. In this way, the OPT project demonstrates the feasibility and usefulness of many of the ideas in DDI-CDI. In this paper we describe the time-series formulation and the UML conceptual model. Then, the design of the OPT system and some of its features are described, relating those that are like DDI-CDI where appropriate. In doing so, we provide a thorough understanding of the structure of time-series.
{"title":"Modernizing data management at the US Bureau of Labor Statistics","authors":"Daniel W. Gillman, Clayton Waring","doi":"10.29173/iq1038","DOIUrl":"https://doi.org/10.29173/iq1038","url":null,"abstract":"The US Bureau of Labor Statistics (BLS) is undertaking initiatives to improve its data and metadata systems. Planning for the replacement of the public facing LABSTAT data query system and efforts within the Office of Productivity and Technology to combine multiple production systems within a single cross-divisional database platform are examples. BLS views time-series data as a combination of three elemental components found in every time-series. A measure element; a person, places, and things element; and a time element are the components. The authors turned this basic approach into a formal conceptual model represented in UML (Unified Modeling Language). The UML model describes a flexible multi-dimensional data structure, of which time-series are a kind, and supports any kind of query into the data. The Office of Productivity and Technology has adopted the model, and it is guiding their approach moving forward. The model was also adopted by the Financial Industry Business Ontology project under the Object Management Group and by the Data Documentation Initiative Cross-Domain Integration (DDI-CDI) development project. There are other similarities between the OPT effort and DDI-CDI as well. In this way, the OPT project demonstrates the feasibility and usefulness of many of the ideas in DDI-CDI. In this paper we describe the time-series formulation and the UML conceptual model. Then, the design of the OPT system and some of its features are described, relating those that are like DDI-CDI where appropriate. In doing so, we provide a thorough understanding of the structure of time-series.","PeriodicalId":84870,"journal":{"name":"IASSIST quarterly","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46238134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sharing data across scientific domains is often impeded by differences in the language used to describe data and metadata. We argue that disagreements over the boundary between data and metadata are a common source of confusion. Information appearing as data in one domain may be considered metadata in another domain, a process that we call “semantic transposition.” To promote greater understanding, we develop new terminology for describing how data and metadata are structured, and we show how it can be applied to a variety of widely used data formats. Our approach builds upon previous work, such as the Observations and Measurements (ISO 19156) data model. We rely on tools from the Data Documentation Initiative’s Cross Domain Integration (DDI-CDI) to illustrate how the same data can be represented in different ways, and how information considered data in one format can become metadata in another format.
{"title":"View points on data points: A shared vocabulary for cross-domain conversations on data and metadata","authors":"George Alter, Flavio Rizzolo, K. Schleidt","doi":"10.29173/iq1051","DOIUrl":"https://doi.org/10.29173/iq1051","url":null,"abstract":"Sharing data across scientific domains is often impeded by differences in the language used to describe data and metadata. We argue that disagreements over the boundary between data and metadata are a common source of confusion. Information appearing as data in one domain may be considered metadata in another domain, a process that we call “semantic transposition.” To promote greater understanding, we develop new terminology for describing how data and metadata are structured, and we show how it can be applied to a variety of widely used data formats. Our approach builds upon previous work, such as the Observations and Measurements (ISO 19156) data model. We rely on tools from the Data Documentation Initiative’s Cross Domain Integration (DDI-CDI) to illustrate how the same data can be represented in different ways, and how information considered data in one format can become metadata in another format.","PeriodicalId":84870,"journal":{"name":"IASSIST quarterly","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48708553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The dramatic increase in use of technological and algorithmic-based solutions for research, economic, and policy decisions has led to a number of high-profile ethical and privacy violations in the last decade. Current disparities in academic curriculum for data and computational science result in significant gaps regarding ethics training in the next generation of data-intensive researchers. Libraries are often called to fill the curricular gaps in data science training for non-data science disciplines, including within the University of California (UC) system. We found that in addition to incomplete computational training, ethics training is almost completely absent in the standard course curricula. In this report, we highlight the experiences of library data services providers in attempting to meet the need for additional training, by designing and running two workshops: Ethical Considerations in Data (2021) and its sequel Data Ethics & Justice (2022). We discuss our interdisciplinary workshop approach and our efforts to highlight resources that can be used by non-experts to engage productively with these topics. Finally, we report a set of recommendations for librarians and data science instructors to more easily incorporate data ethics concepts into curricular instruction.
{"title":"A model for data ethics instruction for non-experts","authors":"L. Phan, Ibraheem Ali, S. Labou, E. Foster","doi":"10.29173/iq1028","DOIUrl":"https://doi.org/10.29173/iq1028","url":null,"abstract":"The dramatic increase in use of technological and algorithmic-based solutions for research, economic, and policy decisions has led to a number of high-profile ethical and privacy violations in the last decade. Current disparities in academic curriculum for data and computational science result in significant gaps regarding ethics training in the next generation of data-intensive researchers. Libraries are often called to fill the curricular gaps in data science training for non-data science disciplines, including within the University of California (UC) system. We found that in addition to incomplete computational training, ethics training is almost completely absent in the standard course curricula. In this report, we highlight the experiences of library data services providers in attempting to meet the need for additional training, by designing and running two workshops: Ethical Considerations in Data (2021) and its sequel Data Ethics & Justice (2022). We discuss our interdisciplinary workshop approach and our efforts to highlight resources that can be used by non-experts to engage productively with these topics. Finally, we report a set of recommendations for librarians and data science instructors to more easily incorporate data ethics concepts into curricular instruction.","PeriodicalId":84870,"journal":{"name":"IASSIST quarterly","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46414958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Despite findings highlighting the severe underrepresentation of women and minoritized groups in data science, most scholarly research has focused on new methodologies, tools, and algorithms as opposed to who data scientists are or how they learn their craft. This paper proposes that increased representation in data science can be achieved via advancing the curation of datasets and pedagogies that empower Black, Indigenous, and other minoritized people of color to enter the field. This work contributes to our understanding of the obstacles facing minoritized students in the classroom and solutions to mitigate their marginalization.
{"title":"Emancipating data science for Black and Indigenous students via liberatory datasets and curricula","authors":"T. Monroe-White","doi":"10.29173/iq1007","DOIUrl":"https://doi.org/10.29173/iq1007","url":null,"abstract":"Despite findings highlighting the severe underrepresentation of women and minoritized groups in data science, most scholarly research has focused on new methodologies, tools, and algorithms as opposed to who data scientists are or how they learn their craft. This paper proposes that increased representation in data science can be achieved via advancing the curation of datasets and pedagogies that empower Black, Indigenous, and other minoritized people of color to enter the field. This work contributes to our understanding of the obstacles facing minoritized students in the classroom and solutions to mitigate their marginalization.","PeriodicalId":84870,"journal":{"name":"IASSIST quarterly","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43434409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Welcome to the final issue of the IASSIST Quarterly for the year 2022 – IQ volume 46(4), our eagerly-awaited special issue on Systemic Racism in Data Practices. This issue represents more than you might think: the culmination of more than two years of the intellectual hard work of writing, of course, but that in itself is not unusual for any journal issue. However. The global pandemic exploded just after the conception of this special issue and hit all of us hard, wreaking not only physical destruction of lives but also unleashing social upheaval, job insecurity, housing insecurity, and major mental health challenges. Social injustice erupted during the pandemic, shocking and enraging many of us with its violence and disregard for human dignity. I was privileged to witness the genesis of this issue, and I helped recruit our guest editors, Trevor Watkins and Jonathan Cain. I salute their perseverance, patience and courage, and that of the article authors, in bringing this content to fruition. Many involved in this issue faced multiple personal challenges, from the loss of family members to repeated moves, job changes, and more in the process of trying to get this work done. Some were unable to surmount the many obstacles and were forced to withdraw their proposals. So I do not think it is hyperbole to say this is the hardest issue we have ever produced. Trevor and Jonathan, thank you again for spearheading this important work. Some good things have come from the societal call for racial justice for IASSIST, including this issue of the IQ. IASSIST has initiated several new ventures to advocate for diversity and equity, both within our organization and among researchers generally: We restructured our membership fees to allow half price for people joining from lower income countries. IASSIST also sponsored diversity scholarships for members to attend the American Library Association conference and the ICPSR Summer Program in Quantitative Methods in 2022. A new Anti-racism Resources Interest Group which focuses on compiling anti-racism resources has been working for more than two years and recently collaborated with the Professional Development Committee to present a webinar on varying national approaches to collecting (or not collecting) data about race and ethnicity (see this page for the webinar recording as well as the essays members have written). The group welcomes contributions of essays for additional countries and suggestions of other webinar topics. Looking ahead, the 2023 conference theme is Diversity in Research: Social Justice from Data, sure to result in some fascinating presentations (and future IQ papers!). And here at the IQ, we’re already contemplating a second special issue in this area around the role of social justice in data services. We invite volunteers who would like to serve as guest editors to contact us. And so the work continues. The IQ editorial team is happy to welcome a new volunteer, Phillip Ndhlovu,
{"title":"The work continues","authors":"Michele Hayslett","doi":"10.29173/iq1076","DOIUrl":"https://doi.org/10.29173/iq1076","url":null,"abstract":"Welcome to the final issue of the IASSIST Quarterly for the year 2022 – IQ volume 46(4), our eagerly-awaited special issue on Systemic Racism in Data Practices.\u0000This issue represents more than you might think: the culmination of more than two years of the intellectual hard work of writing, of course, but that in itself is not unusual for any journal issue. However. The global pandemic exploded just after the conception of this special issue and hit all of us hard, wreaking not only physical destruction of lives but also unleashing social upheaval, job insecurity, housing insecurity, and major mental health challenges. Social injustice erupted during the pandemic, shocking and enraging many of us with its violence and disregard for human dignity. I was privileged to witness the genesis of this issue, and I helped recruit our guest editors, Trevor Watkins and Jonathan Cain. I salute their perseverance, patience and courage, and that of the article authors, in bringing this content to fruition. Many involved in this issue faced multiple personal challenges, from the loss of family members to repeated moves, job changes, and more in the process of trying to get this work done. Some were unable to surmount the many obstacles and were forced to withdraw their proposals. So I do not think it is hyperbole to say this is the hardest issue we have ever produced. Trevor and Jonathan, thank you again for spearheading this important work.\u0000Some good things have come from the societal call for racial justice for IASSIST, including this issue of the IQ. IASSIST has initiated several new ventures to advocate for diversity and equity, both within our organization and among researchers generally: We restructured our membership fees to allow half price for people joining from lower income countries. IASSIST also sponsored diversity scholarships for members to attend the American Library Association conference and the ICPSR Summer Program in Quantitative Methods in 2022. A new Anti-racism Resources Interest Group which focuses on compiling anti-racism resources has been working for more than two years and recently collaborated with the Professional Development Committee to present a webinar on varying national approaches to collecting (or not collecting) data about race and ethnicity (see this page for the webinar recording as well as the essays members have written). The group welcomes contributions of essays for additional countries and suggestions of other webinar topics. Looking ahead, the 2023 conference theme is Diversity in Research: Social Justice from Data, sure to result in some fascinating presentations (and future IQ papers!). And here at the IQ, we’re already contemplating a second special issue in this area around the role of social justice in data services. We invite volunteers who would like to serve as guest editors to contact us. And so the work continues.\u0000The IQ editorial team is happy to welcome a new volunteer, Phillip Ndhlovu, ","PeriodicalId":84870,"journal":{"name":"IASSIST quarterly","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47768769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nastasha E. Johnson, M. Nelson, Katherine N. Yngve
Given the capitalist model of higher education that has developed since the 1980s, the data collected by institutions of higher education on students is based on micro-targeting to understand and retain students as consumers, and to retain that customer base (i.e. to prevent attrition/dropouts). Institutional data has long been collected but the authors will question how, why, and for whom the data is collected in the current higher education model. The authors will then turn to the current higher education focus on equity, diversity, inclusion, and particularly on the concept of belongingness in higher education. The authors question the collective and local purposes of institutional data collection and the fallout of the current practices and will argue that using existing institutional data to facilitate student belongingness is impossible with current practices. We will propose a new framework of asset-minded institutional data practices that centers the student as a whole person and recenters data collection away from the concept of students as commodities. We propose a new framework based on data feminism that intends to elevate qualitative data and all persons/experiences along the bell-shaped curve, not just the middle two quadrants.
{"title":"Deficit, asset, or whole person? Institutional data practices that impact belongingness","authors":"Nastasha E. Johnson, M. Nelson, Katherine N. Yngve","doi":"10.29173/iq1031","DOIUrl":"https://doi.org/10.29173/iq1031","url":null,"abstract":"Given the capitalist model of higher education that has developed since the 1980s, the data collected by institutions of higher education on students is based on micro-targeting to understand and retain students as consumers, and to retain that customer base (i.e. to prevent attrition/dropouts). Institutional data has long been collected but the authors will question how, why, and for whom the data is collected in the current higher education model. The authors will then turn to the current higher education focus on equity, diversity, inclusion, and particularly on the concept of belongingness in higher education. The authors question the collective and local purposes of institutional data collection and the fallout of the current practices and will argue that using existing institutional data to facilitate student belongingness is impossible with current practices. We will propose a new framework of asset-minded institutional data practices that centers the student as a whole person and recenters data collection away from the concept of students as commodities. We propose a new framework based on data feminism that intends to elevate qualitative data and all persons/experiences along the bell-shaped curve, not just the middle two quadrants.\u0000 ","PeriodicalId":84870,"journal":{"name":"IASSIST quarterly","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43188195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}