Pub Date : 2003-05-27DOI: 10.1109/JCDL.2003.1204875
S. Ye, F. Makedon, T. Steinberg, Li Shen, J. Ford, Yuhang Wang, Yan Zhao, S. Kapidakis
We introduce SCENS, a secure content exchange negotiation system suitable for the exchange of private digital data that reside in distributed digital repositories. SCENS is an open negotiation system with flexibility, security and scalability. SCENS is currently being designed to support data sharing in scientific research, by providing incentives and goals specific to a research community. However, it can easily be extended to apply to other communities, such as government, commercial and other types of exchanges. It is a trusted third party software infrastructure enabling independent entities to interact and conduct multiple forms of negotiation.
{"title":"SCENS: a system for the mediated sharing of sensitive data","authors":"S. Ye, F. Makedon, T. Steinberg, Li Shen, J. Ford, Yuhang Wang, Yan Zhao, S. Kapidakis","doi":"10.1109/JCDL.2003.1204875","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204875","url":null,"abstract":"We introduce SCENS, a secure content exchange negotiation system suitable for the exchange of private digital data that reside in distributed digital repositories. SCENS is an open negotiation system with flexibility, security and scalability. SCENS is currently being designed to support data sharing in scientific research, by providing incentives and goals specific to a research community. However, it can easily be extended to apply to other communities, such as government, commercial and other types of exchanges. It is a trusted third party software infrastructure enabling independent entities to interact and conduct multiple forms of negotiation.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116726195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-27DOI: 10.1109/JCDL.2003.1204880
Sarah L. Shreeves, Christine M. Kirkham, J. Kaczmarek, Timothy W. Cole
The Open Archives Initiative (OAI) Protocol for Metadata Harvesting (PMH) facilitates efficient interoperability between digital collections, in particular by enabling service providers to construct, with relatively modest effort, search portals that present aggregated metadata to specific communities. We describe the experiences of the University of Illinois at Urbana-Champaign Library as an OAI service provider. We discuss the creation of a search portal to an aggregation of metadata describing cultural heritage resources. We examine several key challenges posed by the aggregated metadata and present preliminary findings of a pilot study of the utility of the portal for a specific community (student teachers). We also comment briefly on the potential for using text analysis tools to uncover themes and relationships within the aggregated metadata.
{"title":"Utility of an OAI service provider search portal","authors":"Sarah L. Shreeves, Christine M. Kirkham, J. Kaczmarek, Timothy W. Cole","doi":"10.1109/JCDL.2003.1204880","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204880","url":null,"abstract":"The Open Archives Initiative (OAI) Protocol for Metadata Harvesting (PMH) facilitates efficient interoperability between digital collections, in particular by enabling service providers to construct, with relatively modest effort, search portals that present aggregated metadata to specific communities. We describe the experiences of the University of Illinois at Urbana-Champaign Library as an OAI service provider. We discuss the creation of a search portal to an aggregation of metadata describing cultural heritage resources. We examine several key challenges posed by the aggregated metadata and present preliminary findings of a pilot study of the utility of the portal for a specific community (student teachers). We also comment briefly on the potential for using text analysis tools to uncover themes and relationships within the aggregated metadata.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128843794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-27DOI: 10.1109/JCDL.2003.1204885
D. Bainbridge, John Thompson, I. Witten
People who create digital libraries need to gather together the raw material, add metadata as necessary, and design and build new collections. We set out the requirements for these tasks and describe a new tool that supports them interactively, making it easy for users to create their own collections from electronic files of all types. The process involves selecting documents for inclusion, coming up with a suitable metadata set, assigning metadata to each document or group of documents, designing the form of the collection in terms of document formats, searchable indexes, and browsing facilities, building the necessary indexes and data structures, and putting the collection in place for others to use. Moreover, different situations require different workflows, and the system must be flexible enough to cope with these demands. Although the tool is specific to the Greenstone digital library software, the underlying ideas should prove useful in more general contexts.
{"title":"Assembling and enriching digital library collections","authors":"D. Bainbridge, John Thompson, I. Witten","doi":"10.1109/JCDL.2003.1204885","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204885","url":null,"abstract":"People who create digital libraries need to gather together the raw material, add metadata as necessary, and design and build new collections. We set out the requirements for these tasks and describe a new tool that supports them interactively, making it easy for users to create their own collections from electronic files of all types. The process involves selecting documents for inclusion, coming up with a suitable metadata set, assigning metadata to each document or group of documents, designing the form of the collection in terms of document formats, searchable indexes, and browsing facilities, building the necessary indexes and data structures, and putting the collection in place for others to use. Moreover, different situations require different workflows, and the system must be flexible enough to cope with these demands. Although the tool is specific to the Greenstone digital library software, the underlying ideas should prove useful in more general contexts.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127896091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-27DOI: 10.1109/JCDL.2003.1204920
J. Wood, Caitlin M. H. Shaw
CephSchool is based on CephBase and takes the information present in CephBase's digital libraries and redirects it towards students and teachers. CephSchool is organized into eight arms and contains information about cephalopods, discussion topics, teacher support, and student assessment techniques. These provide an accurate and inquiry base-learning environment for students to learn basic biological concepts using cephalopods as the subject organism by giving them a dynamic Web page that is updated, as new information is made available.
{"title":"CephSchool: a pedagogic portal for teaching biological principles with cephalopod molluscs","authors":"J. Wood, Caitlin M. H. Shaw","doi":"10.1109/JCDL.2003.1204920","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204920","url":null,"abstract":"CephSchool is based on CephBase and takes the information present in CephBase's digital libraries and redirects it towards students and teachers. CephSchool is organized into eight arms and contains information about cephalopods, discussion topics, teacher support, and student assessment techniques. These provide an accurate and inquiry base-learning environment for students to learn basic biological concepts using cephalopods as the subject organism by giving them a dynamic Web page that is updated, as new information is made available.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127596015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-27DOI: 10.1109/JCDL.2003.1204859
S. Southwick, R. Southwick
We describe the background context and initial findings from an ongoing case study of an electronic theses and dissertations (ETD) digital library (DL) project in Brazil. The specific focus of the case study centers on the activities of a Brazilian government agency acting as a mediator between software developers - primarily academic institutions in the United States-and university clients in Brazil. We highlight the loosely integrated nature of the DL technology, and the uncertain relationship between developers and users in terms of support. These circumstances reinforce a view of technology transfer as a process of organizational learning. As a consequence, the mediating institution in the study is viewed as assuming multiple roles in advancing the project.
{"title":"Learning digital library technology across borders","authors":"S. Southwick, R. Southwick","doi":"10.1109/JCDL.2003.1204859","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204859","url":null,"abstract":"We describe the background context and initial findings from an ongoing case study of an electronic theses and dissertations (ETD) digital library (DL) project in Brazil. The specific focus of the case study centers on the activities of a Brazilian government agency acting as a mediator between software developers - primarily academic institutions in the United States-and university clients in Brazil. We highlight the loosely integrated nature of the DL technology, and the uncertain relationship between developers and users in terms of support. These circumstances reinforce a view of technology transfer as a process of organizational learning. As a consequence, the mediating institution in the study is viewed as assuming multiple roles in advancing the project.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122549135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-27DOI: 10.1109/JCDL.2003.1204889
M. Droettboom
We present a new technique for dealing with broken characters, one of the major challenges in the optical character recognition (OCR) of degraded historical printed documents. A technique based on graph combinatorics is used to rejoin the appropriate connected components. It has been applied to real data with successful results.
{"title":"Correcting broken characters in the recognition of historical printed documents","authors":"M. Droettboom","doi":"10.1109/JCDL.2003.1204889","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204889","url":null,"abstract":"We present a new technique for dealing with broken characters, one of the major challenges in the optical character recognition (OCR) of degraded historical printed documents. A technique based on graph combinatorics is used to rejoin the appropriate connected components. It has been applied to real data with successful results.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121155043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-27DOI: 10.1109/JCDL.2003.1204848
Yueyu Fu, Javed Mostafa, Kazuhiro Seki
Protein association discovery can directly contribute toward developing protein pathways; hence it is a significant problem in bioinformatics. LUCAS (Library of User-Oriented Concepts for Access Services) was designed to automatically extract and determine associations among proteins from biomedical literature. Such a tool has notable potential to automate database construction in biomedicine, instead of relying on experts' analysis. We report on the mechanisms for automatically generating clusters of proteins. A formal evaluation of the system, based on a subset of 2000 MEDLINE titles and abstracts, has been conducted against Swiss-Prot database in which the associations among concepts are entered by experts manually.
{"title":"Protein association discovery in biomedical literature","authors":"Yueyu Fu, Javed Mostafa, Kazuhiro Seki","doi":"10.1109/JCDL.2003.1204848","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204848","url":null,"abstract":"Protein association discovery can directly contribute toward developing protein pathways; hence it is a significant problem in bioinformatics. LUCAS (Library of User-Oriented Concepts for Access Services) was designed to automatically extract and determine associations among proteins from biomedical literature. Such a tool has notable potential to automate database construction in biomedicine, instead of relying on experts' analysis. We report on the mechanisms for automatically generating clusters of proteins. A formal evaluation of the system, based on a subset of 2000 MEDLINE titles and abstracts, has been conducted against Swiss-Prot database in which the associations among concepts are entered by experts manually.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125876027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-27DOI: 10.1109/JCDL.2003.1204919
Ji-Hoon Kang, Chul-Soo Kim, Eun-Jeong Ko
A standard query language is very helpful for interoperability among digital library systems over the Internet. We propose an XQuery engine that can be used as an XQuery processing module in a digital library system that supports XML documents. We assume generic digital library system architecture. It consists of four modules: a user interface, an XQuery engine, an information retrieval engine, and an XML repository. The XQuery engine parses an input XQuery and constructs a syntax tree for the query.
{"title":"An XQuery engine for digital library systems","authors":"Ji-Hoon Kang, Chul-Soo Kim, Eun-Jeong Ko","doi":"10.1109/JCDL.2003.1204919","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204919","url":null,"abstract":"A standard query language is very helpful for interoperability among digital library systems over the Internet. We propose an XQuery engine that can be used as an XQuery processing module in a digital library system that supports XML documents. We assume generic digital library system architecture. It consists of four modules: a user interface, an XQuery engine, an information retrieval engine, and an XML repository. The XQuery engine parses an input XQuery and constructs a syntax tree for the query.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124981310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-27DOI: 10.1109/JCDL.2003.1204842
Hui Han, C. Lee Giles, Eren Manavoglu, H. Zha, Zhenyue Zhang, E. Fox
Automatic metadata generation provides scalability and usability for digital libraries and their collections. Machine learning methods offer robust and adaptable automatic metadata extraction. We describe a support vector machine classification-based method for metadata extraction from header part of research papers and show that it outperforms other machine learning methods on the same task. The method first classifies each line of the header into one or more of 15 classes. An iterative convergence procedure is then used to improve the line classification by using the predicted class labels of its neighbor lines in the previous round. Further metadata extraction is done by seeking the best chunk boundaries of each line. We found that discovery and use of the structural patterns of the data and domain based word clustering can improve the metadata extraction performance. An appropriate feature normalization also greatly improves the classification performance. Our metadata extraction method was originally designed to improve the metadata extraction quality of the digital libraries Citeseer [S. Lawrence et al., (1999)] and EbizSearch [Y. Petinot et al., (2003)]. We believe it can be generalized to other digital libraries.
自动生成元数据为数字图书馆及其馆藏提供了可伸缩性和可用性。机器学习方法提供鲁棒性和适应性强的自动元数据提取。我们描述了一种基于支持向量机分类的方法,用于从研究论文的标题部分提取元数据,并表明它在相同的任务上优于其他机器学习方法。该方法首先将标题的每行分类为15个类中的一个或多个。然后使用迭代收敛过程,利用前一轮预测的相邻线的类别标签来改进线的分类。进一步的元数据提取是通过寻找每行的最佳块边界来完成的。我们发现发现和使用数据的结构模式和基于领域的词聚类可以提高元数据提取的性能。适当的特征归一化也可以大大提高分类性能。我们的元数据提取方法最初是为了提高数字图书馆Citeseer [S]的元数据提取质量而设计的。劳伦斯等人,(1999)]和EbizSearch [j]。Petinot et al.,(2003)]。我们相信它可以推广到其他数字图书馆。
{"title":"Automatic document metadata extraction using support vector machines","authors":"Hui Han, C. Lee Giles, Eren Manavoglu, H. Zha, Zhenyue Zhang, E. Fox","doi":"10.1109/JCDL.2003.1204842","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204842","url":null,"abstract":"Automatic metadata generation provides scalability and usability for digital libraries and their collections. Machine learning methods offer robust and adaptable automatic metadata extraction. We describe a support vector machine classification-based method for metadata extraction from header part of research papers and show that it outperforms other machine learning methods on the same task. The method first classifies each line of the header into one or more of 15 classes. An iterative convergence procedure is then used to improve the line classification by using the predicted class labels of its neighbor lines in the previous round. Further metadata extraction is done by seeking the best chunk boundaries of each line. We found that discovery and use of the structural patterns of the data and domain based word clustering can improve the metadata extraction performance. An appropriate feature normalization also greatly improves the classification performance. Our metadata extraction method was originally designed to improve the metadata extraction quality of the digital libraries Citeseer [S. Lawrence et al., (1999)] and EbizSearch [Y. Petinot et al., (2003)]. We believe it can be generalized to other digital libraries.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"125 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120853091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-27DOI: 10.1109/JCDL.2003.1204862
Yi-Chun Chu, I. Witten, R. Lobb, D. Bainbridge
Can digital libraries provide a reading experience that more closely resembles a real book than a scrolled or paginated electronic display? We describe a prototype page turning system that realistically animates full three-dimensional page-turns. The dynamic behavior is generated by a mass-spring model defined on a rectangular grid of particles. The prototype takes a PDF or e-book file, renders it into a sequence of PNG images representing individual pages, and animates the page-turns under user control. The simulation behaves fairly naturally, although more computer graphics work is required to perfect it.
{"title":"How to turn the page [digital libraries]","authors":"Yi-Chun Chu, I. Witten, R. Lobb, D. Bainbridge","doi":"10.1109/JCDL.2003.1204862","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204862","url":null,"abstract":"Can digital libraries provide a reading experience that more closely resembles a real book than a scrolled or paginated electronic display? We describe a prototype page turning system that realistically animates full three-dimensional page-turns. The dynamic behavior is generated by a mass-spring model defined on a rectangular grid of particles. The prototype takes a PDF or e-book file, renders it into a sequence of PNG images representing individual pages, and animates the page-turns under user control. The simulation behaves fairly naturally, although more computer graphics work is required to perfect it.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129930433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}