The process of building a geospatial component to access existing materials in the Perseus Digital Library has raised interesting questions about the interaction between historical and geospatial data. The traditional methods of describing geographic features' names and locations do not provide a complete solution for historical data such as that in the Perseus Digital Library. Very often data sources for a spatial database must be created from the historical materials themselves.
{"title":"Generating and reintegrating geospatial data","authors":"Robert F. Chavez","doi":"10.1145/336597.336684","DOIUrl":"https://doi.org/10.1145/336597.336684","url":null,"abstract":"The process of building a geospatial component to access existing materials in the Perseus Digital Library has raised interesting questions about the interaction between historical and geospatial data. The traditional methods of describing geographic features' names and locations do not provide a complete solution for historical data such as that in the Perseus Digital Library. Very often data sources for a spatial database must be created from the historical materials themselves.","PeriodicalId":42447,"journal":{"name":"Digital Library Perspectives","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2000-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79360177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study was designed to examine user beliefs and behavior on the selection and use of search features and search interfaces. Five weeks of user logs were taken from a user-targeted collection and surveys were administered immediately before and after this time period. Survey results indicate a significant correlation between a user's level of effort and their perceived benefit from that effort.Reported search feature use increased by more than 35% over the fiveweeks. This raises the question of how the behavior of an Internet user changes over time. Results from the log files were inconclusive but suggest a reluctance to use the advanced search interface.
{"title":"User effort in query construction and interface selection","authors":"Paul Gerwe, C. Viles","doi":"10.1145/336597.336679","DOIUrl":"https://doi.org/10.1145/336597.336679","url":null,"abstract":"This study was designed to examine user beliefs and behavior on the selection and use of search features and search interfaces. Five weeks of user logs were taken from a user-targeted collection and surveys were administered immediately before and after this time period. Survey results indicate a significant correlation between a user's level of effort and their perceived benefit from that effort.Reported search feature use increased by more than 35% over the fiveweeks. This raises the question of how the behavior of an Internet user changes over time. Results from the log files were inconclusive but suggest a reluctance to use the advanced search interface.","PeriodicalId":42447,"journal":{"name":"Digital Library Perspectives","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2000-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85122916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Digital library search results are usually shown as a textual list, with 10-20 items per page. Viewing several thousand search results at once on a two-dimensional display with continuous variables is a promising alternative. Since these displays can overwhelm some users, we created a simplified two-dimensional display that uses categorical and hierarchical axes, called hieraxes. Users appreciate the meaningful and limited number of terms on each hieraxis. At each grid point of the display we show a cluster of color-coded dots or a bar chart. Users see the entire result set and can then click on labels to move down a level in the hierarchy. Handling broad hierarchies and arranging for imposed hierarchies led to additional design innovations. We applied hieraxes to a digital video library of science topics used by middle school teachers, a legal information system, and a technical library using the ACM Computing Classification System. Feedback from usability testing with 32 subjects revealed strengths and weaknesses.
{"title":"Visualizing digital library search results with categorical and hierarchical axes","authors":"B. Shneiderman, David Feldman, A. Rose, X. Ferré","doi":"10.1145/336597.336637","DOIUrl":"https://doi.org/10.1145/336597.336637","url":null,"abstract":"Digital library search results are usually shown as a textual list, with 10-20 items per page. Viewing several thousand search results at once on a two-dimensional display with continuous variables is a promising alternative. Since these displays can overwhelm some users, we created a simplified two-dimensional display that uses categorical and hierarchical axes, called hieraxes. Users appreciate the meaningful and limited number of terms on each hieraxis. At each grid point of the display we show a cluster of color-coded dots or a bar chart. Users see the entire result set and can then click on labels to move down a level in the hierarchy. Handling broad hierarchies and arranging for imposed hierarchies led to additional design innovations. We applied hieraxes to a digital video library of science topics used by middle school teachers, a legal information system, and a technical library using the ACM Computing Classification System. Feedback from usability testing with 32 subjects revealed strengths and weaknesses.","PeriodicalId":42447,"journal":{"name":"Digital Library Perspectives","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2000-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85723325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes KeyLinking, a framework for dynamic resolution of soft and implied hypertext links to the most appropriate available resource at the time of usage.
{"title":"KeyLinking: dynamic hypertext in a digital library","authors":"Bob Pritchett","doi":"10.1145/336597.336677","DOIUrl":"https://doi.org/10.1145/336597.336677","url":null,"abstract":"This paper describes KeyLinking, a framework for dynamic resolution of soft and implied hypertext links to the most appropriate available resource at the time of usage.","PeriodicalId":42447,"journal":{"name":"Digital Library Perspectives","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2000-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77014793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Significant efforts are being made to digitize rare and valuable library materials, with the goal of providing patrons and historians digital facsimiles that capture the "look and feel" of the original materials. This is often done by digitally photographing the materials and making high resolution 2D images available. The underlying assumption is that the objects are flat. However, older materials may not be flat in practice, being warped and crinkled due to decay, neglect, accident and the passing of time. In such cases, 2D imaging is insufficient to capture the "look and feel" of the original. For these materials, 3D acquisition is necessary to create a realistic facsimile. This paper outlines a technique for capturing an accurate 3D representation of library materials which can be integrated directly into current digitization setups. This will allow digitization efforts to provide patrons with more realistic digital facsimile of library materials.
{"title":"Beyond 2D images: effective 3D imaging for library materials","authors":"M. S. Brown, W. Seales","doi":"10.1145/336597.336623","DOIUrl":"https://doi.org/10.1145/336597.336623","url":null,"abstract":"Significant efforts are being made to digitize rare and valuable library materials, with the goal of providing patrons and historians digital facsimiles that capture the \"look and feel\" of the original materials. This is often done by digitally photographing the materials and making high resolution 2D images available. The underlying assumption is that the objects are flat. However, older materials may not be flat in practice, being warped and crinkled due to decay, neglect, accident and the passing of time. In such cases, 2D imaging is insufficient to capture the \"look and feel\" of the original. For these materials, 3D acquisition is necessary to create a realistic facsimile. This paper outlines a technique for capturing an accurate 3D representation of library materials which can be integrated directly into current digitization setups. This will allow digitization efforts to provide patrons with more realistic digital facsimile of library materials.","PeriodicalId":42447,"journal":{"name":"Digital Library Perspectives","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2000-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87093604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper introduces an approach that organizes retrieval results semantically and displays them spatially for browsing. Latent Semantic Analysis as well as cluster techniques are applied for semantic data analysis. A modified Boltzman algorithm is used to layout documents in a two-dimensional space for interactive exploration. The approach was implemented to visualize retrieval results from two different databases: the Science Citation Index Expanded and theDido Image Bank.
本文介绍了一种对检索结果进行语义组织和空间显示以供浏览的方法。潜在语义分析和聚类技术被应用于语义数据分析。采用改进的玻尔兹曼算法在二维空间中进行文档布局,便于交互探索。该方法被用于可视化两个不同数据库的检索结果:Science Citation Index Expanded和theDido Image Bank。
{"title":"Extracting and visualizing semantic structures in retrieval results for browsing","authors":"K. Börner","doi":"10.1145/336597.336672","DOIUrl":"https://doi.org/10.1145/336597.336672","url":null,"abstract":"The paper introduces an approach that organizes retrieval results semantically and displays them spatially for browsing. Latent Semantic Analysis as well as cluster techniques are applied for semantic data analysis. A modified Boltzman algorithm is used to layout documents in a two-dimensional space for interactive exploration. The approach was implemented to visualize retrieval results from two different databases: the Science Citation Index Expanded and theDido Image Bank.","PeriodicalId":42447,"journal":{"name":"Digital Library Perspectives","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2000-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86547424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Kumazawa, Hironori Kamada, A. Yamada, H. Hoshino, Y. Kambayashi, M. Mohania
Modification and reuse of digital contents, particularly web data, has become easy due to advancement in computer technology. The work created by a person (or a group of persons) can be reused by another person (or group) for different purpose. Thus, there can be multiple copyright holders for the same information. In such scenario, it is important to define the relationship among the copyright holders. There have been some attempts at designing the Electronic Copyright Management System (ECMS) cite{6,7}. However, this system is used for defining and registering the information of one copyright holder. Another attempt to develop copyright management systems is reported in cite{indecs}. The key feature of this system is that they have an RDF expression for copyright metadata. When multiple creators take part in creating one content or when contents are created reusing existing contents, it is vital to define the relationship among copyright holders. This relationship should be clarified in order to make proper allocation of profits and protect rights of all the copyright holders. Therefore, a framework is required where the relationship among copyright holders and profit allocations are described. In this paper we outline the main points of such framework. We refer the readers to see cite{tr-wmu} for a detailed description. The proposed framework also represents copyright processing for multiple rights holders, it gives a framework for realizing transaction systems where reuse for creating new contents is promoted. In this paper we also outline the conceptual model designed for describing copyright information, examining the relationship among rights holders, and modeling charge rules.
{"title":"Relationship among copyright holders for use and reuse of digital contents","authors":"M. Kumazawa, Hironori Kamada, A. Yamada, H. Hoshino, Y. Kambayashi, M. Mohania","doi":"10.1145/336597.336688","DOIUrl":"https://doi.org/10.1145/336597.336688","url":null,"abstract":"Modification and reuse of digital contents, particularly web data, has become easy due to advancement in computer technology. The work created by a person (or a group of persons) can be reused by another person (or group) for different purpose. Thus, there can be multiple copyright holders for the same information. In such scenario, it is important to define the relationship among the copyright holders. There have been some attempts at designing the Electronic Copyright Management System (ECMS) cite{6,7}. However, this system is used for defining and registering the information of one copyright holder. Another attempt to develop copyright management systems is reported in cite{indecs}. The key feature of this system is that they have an RDF expression for copyright metadata. When multiple creators take part in creating one content or when contents are created reusing existing contents, it is vital to define the relationship among copyright holders. This relationship should be clarified in order to make proper allocation of profits and protect rights of all the copyright holders. Therefore, a framework is required where the relationship among copyright holders and profit allocations are described. In this paper we outline the main points of such framework. We refer the readers to see cite{tr-wmu} for a detailed description. The proposed framework also represents copyright processing for multiple rights holders, it gives a framework for realizing transaction systems where reuse for creating new contents is promoted. In this paper we also outline the conceptual model designed for describing copyright information, examining the relationship among rights holders, and modeling charge rules.","PeriodicalId":42447,"journal":{"name":"Digital Library Perspectives","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2000-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90627014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Our prototype automatic title generation system inspired by statistical machine-translation approaches [1] treats the document title like a translation of the document. Titles can be generated without extracting words from the document. A large corpus of documents with human-assigned titles is required for training title "translation" models. On an f1 evaluation score our approach outperformed another approach based on Bayesian probability estimates [7].
{"title":"Automatic title generation for EM","authors":"Paul E. Kennedy, Alexander Hauptmann","doi":"10.1145/336597.336670","DOIUrl":"https://doi.org/10.1145/336597.336670","url":null,"abstract":"Our prototype automatic title generation system inspired by statistical machine-translation approaches [1] treats the document title like a translation of the document. Titles can be generated without extracting words from the document. A large corpus of documents with human-assigned titles is required for training title \"translation\" models. On an f1 evaluation score our approach outperformed another approach based on Bayesian probability estimates [7].","PeriodicalId":42447,"journal":{"name":"Digital Library Perspectives","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2000-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73503114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Text documents often contain valuable structured data that is hidden Yin regular English sentences. This data is best exploited infavailable as arelational table that we could use for answering precise queries or running data mining tasks.We explore a technique for extracting such tables from document collections that requires only a handful of training examples from users. These examples are used to generate extraction patterns, that in turn result in new tuples being extracted from the document collection.We build on this idea and present our Snowball system. Snowball introduces novel strategies for generating patterns and extracting tuples from plain-text documents.At each iteration of the extraction process, Snowball evaluates the quality of these patterns and tuples without human intervention,and keeps only the most reliable ones for the next iteration. In this paper we also develop a scalable evaluation methodology and metrics for our task, and present a thorough experimental evaluation of Snowball and comparable techniques over a collection of more than 300,000 newspaper documents.
{"title":"Snowball: extracting relations from large plain-text collections","authors":"Eugene Agichtein, L. Gravano","doi":"10.1145/336597.336644","DOIUrl":"https://doi.org/10.1145/336597.336644","url":null,"abstract":"Text documents often contain valuable structured data that is hidden Yin regular English sentences. This data is best exploited infavailable as arelational table that we could use for answering precise queries or running data mining tasks.We explore a technique for extracting such tables from document collections that requires only a handful of training examples from users. These examples are used to generate extraction patterns, that in turn result in new tuples being extracted from the document collection.We build on this idea and present our Snowball system. Snowball introduces novel strategies for generating patterns and extracting tuples from plain-text documents.At each iteration of the extraction process, Snowball evaluates the quality of these patterns and tuples without human intervention,and keeps only the most reliable ones for the next iteration. In this paper we also develop a scalable evaluation methodology and metrics for our task, and present a thorough experimental evaluation of Snowball and comparable techniques over a collection of more than 300,000 newspaper documents.","PeriodicalId":42447,"journal":{"name":"Digital Library Perspectives","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2000-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74993470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper reports on measurements of the NCSTRL digital library taken over a two-year period. We report the growth of the system along two dimensions: number of participating institutions and number of documents indexed by the system. We also report an aspect of reliability for this distributed digital library system.
{"title":"Growth and server availability of the NCSTRL digital library","authors":"Allison L. Powell, J. French","doi":"10.1145/336597.336696","DOIUrl":"https://doi.org/10.1145/336597.336696","url":null,"abstract":"This paper reports on measurements of the NCSTRL digital library taken over a two-year period. We report the growth of the system along two dimensions: number of participating institutions and number of documents indexed by the system. We also report an aspect of reliability for this distributed digital library system.","PeriodicalId":42447,"journal":{"name":"Digital Library Perspectives","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2000-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73632060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}