This poster presents a pragmatic risk assessment method based on best practice from the ISO 31000 family of standards regarding risk management. The method proposed is supported by established risk management concepts that can be applied to help a data repository to gain awareness of the risks and costs of the controls for the identified risks. In simple terms the technique that supports this method is a pragmatic risk registry that can be used to identify risks from a Business Model Canvas of an organization. A Business Model Canvas is a model used in strategic management to document existing business models and develop new ones.
{"title":"Using the Business Model Canvas to Support a Risk Assessment Method for Digital Curation","authors":"D. Proença, A. Nadali, J. Borbinha","doi":"10.1145/2756406.2756965","DOIUrl":"https://doi.org/10.1145/2756406.2756965","url":null,"abstract":"This poster presents a pragmatic risk assessment method based on best practice from the ISO 31000 family of standards regarding risk management. The method proposed is supported by established risk management concepts that can be applied to help a data repository to gain awareness of the risks and costs of the controls for the identified risks. In simple terms the technique that supports this method is a pragmatic risk registry that can be used to identify risks from a Business Model Canvas of an organization. A Business Model Canvas is a model used in strategic management to document existing business models and develop new ones.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124551754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Hinze, Craig Taube-Schock, D. Bainbridge, Rangi Matamua, J. S. Downie
With 13,000,000 volumes comprising 4.5 billion pages of text, it is currently very difficult for scholars to locate relevant sets of documents that are useful in their research from the HathiTrust Digital Libary (HTDL) using traditional lexically-based retrieval techniques. Existing document search tools and document clustering approaches use purely lexical analysis, which cannot address the inherent ambiguity of natural language. A semantic search approach offers the potential to overcome the shortcoming of lexical search, but even if an appropriate network of ontologies could be decided upon it would require a full semantic markup of each document. In this paper, we present a conceptual design and report on the initial implementation of a new framework that affords the benefits of semantic search while minimizing the problems associated with applying existing semantic analysis at scale. Our approach avoids the need for complete semantic document markup using pre-existing ontologies by developing an automatically generated Concept-in-Context (CiC) network seeded by a priori analysis of Wikipedia texts and identification of semantic metadata. Our Capisco system analyzes documents by the semantics and context of their content. The disambiguation of search queries is done interactively, to fully utilize the domain knowledge of the scholar. Our method achieves a form of semantic-enhanced search that simultaneously exploits the proven scale benefits provided by lexical indexing.
{"title":"Improving Access to Large-scale Digital Libraries ThroughSemantic-enhanced Search and Disambiguation","authors":"A. Hinze, Craig Taube-Schock, D. Bainbridge, Rangi Matamua, J. S. Downie","doi":"10.1145/2756406.2756920","DOIUrl":"https://doi.org/10.1145/2756406.2756920","url":null,"abstract":"With 13,000,000 volumes comprising 4.5 billion pages of text, it is currently very difficult for scholars to locate relevant sets of documents that are useful in their research from the HathiTrust Digital Libary (HTDL) using traditional lexically-based retrieval techniques. Existing document search tools and document clustering approaches use purely lexical analysis, which cannot address the inherent ambiguity of natural language. A semantic search approach offers the potential to overcome the shortcoming of lexical search, but even if an appropriate network of ontologies could be decided upon it would require a full semantic markup of each document. In this paper, we present a conceptual design and report on the initial implementation of a new framework that affords the benefits of semantic search while minimizing the problems associated with applying existing semantic analysis at scale. Our approach avoids the need for complete semantic document markup using pre-existing ontologies by developing an automatically generated Concept-in-Context (CiC) network seeded by a priori analysis of Wikipedia texts and identification of semantic metadata. Our Capisco system analyzes documents by the semantics and context of their content. The disambiguation of search queries is done interactively, to fully utilize the domain knowledge of the scholar. Our method achieves a form of semantic-enhanced search that simultaneously exploits the proven scale benefits provided by lexical indexing.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"213 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114843680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the large volume and complexity of data, exploring data using visual analytics has become more helpful to interpret and analyze it. The box plot is one of graphical ways and is the most common technique for presenting and summarizing statistics. In this paper, we focus on discussing the tagging patterns by integrating visualization assessment using the box plot with the Shapiro-Wilk test.
{"title":"Analyzing Tagging Patterns by Integrating Visual Analytics with the Inferential Test","authors":"Yunseon Choi","doi":"10.1145/2756406.2756962","DOIUrl":"https://doi.org/10.1145/2756406.2756962","url":null,"abstract":"Due to the large volume and complexity of data, exploring data using visual analytics has become more helpful to interpret and analyze it. The box plot is one of graphical ways and is the most common technique for presenting and summarizing statistics. In this paper, we focus on discussing the tagging patterns by integrating visualization assessment using the box plot with the Shapiro-Wilk test.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126155177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pucktada Treeratpituk, Madian Khabsa, C. Lee Giles
We propose a novel graph-based approach for constructing concept hierarchy from a large text corpus. Our algorithm incorporates both statistical co-occurrences and lexical similarity in optimizing the structure of the taxonomy. To automatically generate topic-dependent taxonomies from a large text corpus, we first extracts topical terms and their relationships from the corpus. The algorithm then constructs a weighted graph representing topics and their associations. A graph partitioning algorithm is then used to recursively partition the topic graph into a taxonomy. For evaluation, we apply our approach to articles, primarily computer science, in the CiteSeerX digital library and search engine.
{"title":"Automatically Generating a Concept Hierarchy with Graphs","authors":"Pucktada Treeratpituk, Madian Khabsa, C. Lee Giles","doi":"10.1145/2756406.2756967","DOIUrl":"https://doi.org/10.1145/2756406.2756967","url":null,"abstract":"We propose a novel graph-based approach for constructing concept hierarchy from a large text corpus. Our algorithm incorporates both statistical co-occurrences and lexical similarity in optimizing the structure of the taxonomy. To automatically generate topic-dependent taxonomies from a large text corpus, we first extracts topical terms and their relationships from the corpus. The algorithm then constructs a weighted graph representing topics and their associations. A graph partitioning algorithm is then used to recursively partition the topic graph into a taxonomy. For evaluation, we apply our approach to articles, primarily computer science, in the CiteSeerX digital library and search engine.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125065263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This poster presents preliminary findings of user tag analysis in the domain of consumer health information. To obtain user terms, 36,205 tags from 38 consumer health information sites were collected from delicious.com. Content analysis was applied to identify the dimensions and types of the collected tags. The preliminary findings showed that user generated tags covers a variety of aspects of health information, ranging from general terms, subject terms, knowledge type, and to audience. General terms and subject terms were observed dominantly by showing 31.7% and 22.8% respectively.
{"title":"Content Analysis of Social Tags Generated by Health Consumers","authors":"Soohyung Joo, Yunseon Choi","doi":"10.1145/2756406.2756959","DOIUrl":"https://doi.org/10.1145/2756406.2756959","url":null,"abstract":"This poster presents preliminary findings of user tag analysis in the domain of consumer health information. To obtain user terms, 36,205 tags from 38 consumer health information sites were collected from delicious.com. Content analysis was applied to identify the dimensions and types of the collected tags. The preliminary findings showed that user generated tags covers a variety of aspects of health information, ranging from general terms, subject terms, knowledge type, and to audience. General terms and subject terms were observed dominantly by showing 31.7% and 22.8% respectively.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129886203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lulwah M. Alkwai, Michael L. Nelson, Michele C. Weigle
It is has long been anecdotally known that web archives and search engines favor Western and English-language sites. In this paper we quantitatively explore how well indexed and archived are Arabic language web sites. We began by sampling 15,092 unique URIs from three different website directories: DMOZ (multi-lingual), Raddadi and Star28 (both primarily Arabic language). Using language identification tools we eliminated pages not in the Arabic language (e.g., English language versions of Al-Jazeera sites) and culled the collection to 7,976 definitely Arabic language web pages. We then used these 7,976 pages and crawled the live web and web archives to produce a collection of 300,646 Arabic language pages. We discovered: 1) 46% are not archived and 31% are not indexed by Google (www.google.com), 2) only 14.84% of the URIs had an Arabic country code top-level domain (e.g., .sa) and only 10.53% had a GeoIP in an Arabic country, 3) having either only an Arabic GeoIP or only an Arabic top-level domain appears to negatively impact archiving, 4) most of the archived pages are near the top level of the site and deeper links into the site are not well-archived, 5) the presence in a directory positively impacts indexing and presence in the DMOZ directory, specifically, positively impacts archiving.
{"title":"How Well Are Arabic Websites Archived?","authors":"Lulwah M. Alkwai, Michael L. Nelson, Michele C. Weigle","doi":"10.1145/2756406.2756912","DOIUrl":"https://doi.org/10.1145/2756406.2756912","url":null,"abstract":"It is has long been anecdotally known that web archives and search engines favor Western and English-language sites. In this paper we quantitatively explore how well indexed and archived are Arabic language web sites. We began by sampling 15,092 unique URIs from three different website directories: DMOZ (multi-lingual), Raddadi and Star28 (both primarily Arabic language). Using language identification tools we eliminated pages not in the Arabic language (e.g., English language versions of Al-Jazeera sites) and culled the collection to 7,976 definitely Arabic language web pages. We then used these 7,976 pages and crawled the live web and web archives to produce a collection of 300,646 Arabic language pages. We discovered: 1) 46% are not archived and 31% are not indexed by Google (www.google.com), 2) only 14.84% of the URIs had an Arabic country code top-level domain (e.g., .sa) and only 10.53% had a GeoIP in an Arabic country, 3) having either only an Arabic GeoIP or only an Arabic top-level domain appears to negatively impact archiving, 4) most of the archived pages are near the top level of the site and deeper links into the site are not well-archived, 5) the presence in a directory positively impacts indexing and presence in the DMOZ directory, specifically, positively impacts archiving.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127390116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The digital library has been on a seemingly insatiable quest for “innovation” for decades. This focus permeates our field, usually in the guise of transforming digital library practices. The themes change over time (e.g., Federating library collections! Digital humanities! Digital preservation! Big data!), but dependably, digital library research projects on “innovation” topics are seeded in abundance each year. Researchers are rewarded (and funded) for their big, experimental ideas, not for successful applications of innovations in practice. Gearing resources toward “innovation” alone prizes the unique or novel approach above the cultivation of our field. Few innovations ever flower and thrive beyond their initial moments in the sun. What might happen if digital libraries shift their focus from the innovative solution to the process of using innovations within networks to actively facilitate system-wide change? Drawing from the disciplines of sociology and economics, Skinner will explore both established and emergent models for system-wide transformation, ultimately asking what digital libraries could accomplish as a field if we shifted our focus from “innovation” to “impact.”
{"title":"Moving the Needle: From Innovation to Impact","authors":"Katherine Skinner","doi":"10.1145/2756406.2756408","DOIUrl":"https://doi.org/10.1145/2756406.2756408","url":null,"abstract":"The digital library has been on a seemingly insatiable quest for “innovation” for decades. This focus permeates our field, usually in the guise of transforming digital library practices. The themes change over time (e.g., Federating library collections! Digital humanities! Digital preservation! Big data!), but dependably, digital library research projects on “innovation” topics are seeded in abundance each year. Researchers are rewarded (and funded) for their big, experimental ideas, not for successful applications of innovations in practice. Gearing resources toward “innovation” alone prizes the unique or novel approach above the cultivation of our field. Few innovations ever flower and thrive beyond their initial moments in the sun. What might happen if digital libraries shift their focus from the innovative solution to the process of using innovations within networks to actively facilitate system-wide change? Drawing from the disciplines of sociology and economics, Skinner will explore both established and emergent models for system-wide transformation, ultimately asking what digital libraries could accomplish as a field if we shifted our focus from “innovation” to “impact.”","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123139832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luis Meneses, S. Jayarathna, R. Furuta, F. Shipman
It is not unusual for digital collections to degrade and suffer from problems associated with unexpected change. In an analysis of the ACM conference list, we found that categorizing the degree of change affecting a digital collection over time is a difficult task. More specifically, we found that categorizing this degree of change is not a binary problem where documents are either unchanged or they have changed so dramatically that they do not fit within the scope of the collection. It is, in part, a characterization of the intent of the change. In this work, we examine and categorize the various degrees of change that digital documents endure within the boundaries of an institutionally managed repository.
{"title":"Grading Degradation in an Institutionally Managed Repository","authors":"Luis Meneses, S. Jayarathna, R. Furuta, F. Shipman","doi":"10.1145/2756406.2756966","DOIUrl":"https://doi.org/10.1145/2756406.2756966","url":null,"abstract":"It is not unusual for digital collections to degrade and suffer from problems associated with unexpected change. In an analysis of the ACM conference list, we found that categorizing the degree of change affecting a digital collection over time is a difficult task. More specifically, we found that categorizing this degree of change is not a binary problem where documents are either unchanged or they have changed so dramatically that they do not fit within the scope of the collection. It is, in part, a characterization of the intent of the change. In this work, we examine and categorize the various degrees of change that digital documents endure within the boundaries of an institutionally managed repository.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120974466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Madian Khabsa, Pucktada Treeratpituk, C. Lee Giles
While many clustering techniques have been successfully applied to the person name disambiguation problem, most do not address two main practical issues: allowing constraints to be added to the clustering process, and allowing the data to be added incrementally without clustering the entire database. Constraints can be particularly useful especially in a system such as a digital library, where users are allowed to make corrections to the disambiguated result. For example, a user correction on a disambiguation result specifying that a record does not belong to an author could be kept as a cannot-link constraint to be used in any future disambiguation (such as when new documents are added). Besides such user corrections, constraints also allow background heuristics to be encoded into the disambiguation process. We propose a constraint-based clustering algorithm for person name disambiguation, based on DBSCAN combined with a pairwise distance based on random forests. We further propose an extension to the density-based clustering algorithm (DBSCAN) to handle online clustering so that the disambiguation process can be done iteratively as new data points are added. Our algorithm utilizes similarity features based on both metadata information and citation similarity. We implement two types of clustering constraints to demonstrate the concept. Experiments on the CiteSeer data show that our model can achieve 0.95 pairwise F1 and 0.79 cluster F1. The presence of constraints also consistently improves the disambiguation result across different combinations of features.
{"title":"Online Person Name Disambiguation with Constraints","authors":"Madian Khabsa, Pucktada Treeratpituk, C. Lee Giles","doi":"10.1145/2756406.2756915","DOIUrl":"https://doi.org/10.1145/2756406.2756915","url":null,"abstract":"While many clustering techniques have been successfully applied to the person name disambiguation problem, most do not address two main practical issues: allowing constraints to be added to the clustering process, and allowing the data to be added incrementally without clustering the entire database. Constraints can be particularly useful especially in a system such as a digital library, where users are allowed to make corrections to the disambiguated result. For example, a user correction on a disambiguation result specifying that a record does not belong to an author could be kept as a cannot-link constraint to be used in any future disambiguation (such as when new documents are added). Besides such user corrections, constraints also allow background heuristics to be encoded into the disambiguation process. We propose a constraint-based clustering algorithm for person name disambiguation, based on DBSCAN combined with a pairwise distance based on random forests. We further propose an extension to the density-based clustering algorithm (DBSCAN) to handle online clustering so that the disambiguation process can be done iteratively as new data points are added. Our algorithm utilizes similarity features based on both metadata information and citation similarity. We implement two types of clustering constraints to demonstrate the concept. Experiments on the CiteSeer data show that our model can achieve 0.95 pairwise F1 and 0.79 cluster F1. The presence of constraints also consistently improves the disambiguation result across different combinations of features.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"272 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123368699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The assignment of subject metadata to music is useful for organizing and accessing digital music collections. Since manual subject annotation of large-scale music collections is labor-intensive, automatic methods are preferred. Topic modeling algorithms can be used to automatically identify latent topics from appropriate text sources. Candidate text sources such as song lyrics are often too poetic, resulting in lower-quality topics. Users' interpretations of song lyrics provide an alternative source. In this paper, we propose an automatic topic discovery system from web-mined user-generated interpretations of songs to provide subject access to a music digital library. We also propose and evaluate filtering techniques to identify high-quality topics. In our experiments, we use 24,436 popular songs that exist in both the Million Song Dataset and songmeanings.com. Topic models are generated using Latent Dirichlet Allocation (LDA). To evaluate the coherence of learned topics, we calculate the Normalized Pointwise Mutual Information (NPMI) of the top ten words in each topic based on occurrences in Wikipedia. Finally, we evaluate the resulting topics using a subset of 422 songs that have been manually assigned to six subjects. Using this system, 71% of the manually assigned subjects were correctly identified. These results demonstrate that topic modeling of song interpretations is a promising method for subject metadata enrichment in music digital libraries. It also has implications for affording similar access to collections of poetry and fiction.
{"title":"Topic Modeling Users' Interpretations of Songs to Inform Subject Access in Music Digital Libraries","authors":"Kahyun Choi, Jin Ha Lee, C. Willis, J. S. Downie","doi":"10.1145/2756406.2756936","DOIUrl":"https://doi.org/10.1145/2756406.2756936","url":null,"abstract":"The assignment of subject metadata to music is useful for organizing and accessing digital music collections. Since manual subject annotation of large-scale music collections is labor-intensive, automatic methods are preferred. Topic modeling algorithms can be used to automatically identify latent topics from appropriate text sources. Candidate text sources such as song lyrics are often too poetic, resulting in lower-quality topics. Users' interpretations of song lyrics provide an alternative source. In this paper, we propose an automatic topic discovery system from web-mined user-generated interpretations of songs to provide subject access to a music digital library. We also propose and evaluate filtering techniques to identify high-quality topics. In our experiments, we use 24,436 popular songs that exist in both the Million Song Dataset and songmeanings.com. Topic models are generated using Latent Dirichlet Allocation (LDA). To evaluate the coherence of learned topics, we calculate the Normalized Pointwise Mutual Information (NPMI) of the top ten words in each topic based on occurrences in Wikipedia. Finally, we evaluate the resulting topics using a subset of 422 songs that have been manually assigned to six subjects. Using this system, 71% of the manually assigned subjects were correctly identified. These results demonstrate that topic modeling of song interpretations is a promising method for subject metadata enrichment in music digital libraries. It also has implications for affording similar access to collections of poetry and fiction.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"294 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124226364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}