Pub Date : 2003-05-27DOI: 10.1109/JCDL.2003.1204885
D. Bainbridge, John Thompson, I. Witten
People who create digital libraries need to gather together the raw material, add metadata as necessary, and design and build new collections. We set out the requirements for these tasks and describe a new tool that supports them interactively, making it easy for users to create their own collections from electronic files of all types. The process involves selecting documents for inclusion, coming up with a suitable metadata set, assigning metadata to each document or group of documents, designing the form of the collection in terms of document formats, searchable indexes, and browsing facilities, building the necessary indexes and data structures, and putting the collection in place for others to use. Moreover, different situations require different workflows, and the system must be flexible enough to cope with these demands. Although the tool is specific to the Greenstone digital library software, the underlying ideas should prove useful in more general contexts.
{"title":"Assembling and enriching digital library collections","authors":"D. Bainbridge, John Thompson, I. Witten","doi":"10.1109/JCDL.2003.1204885","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204885","url":null,"abstract":"People who create digital libraries need to gather together the raw material, add metadata as necessary, and design and build new collections. We set out the requirements for these tasks and describe a new tool that supports them interactively, making it easy for users to create their own collections from electronic files of all types. The process involves selecting documents for inclusion, coming up with a suitable metadata set, assigning metadata to each document or group of documents, designing the form of the collection in terms of document formats, searchable indexes, and browsing facilities, building the necessary indexes and data structures, and putting the collection in place for others to use. Moreover, different situations require different workflows, and the system must be flexible enough to cope with these demands. Although the tool is specific to the Greenstone digital library software, the underlying ideas should prove useful in more general contexts.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127896091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-27DOI: 10.1109/JCDL.2003.1204920
J. Wood, Caitlin M. H. Shaw
CephSchool is based on CephBase and takes the information present in CephBase's digital libraries and redirects it towards students and teachers. CephSchool is organized into eight arms and contains information about cephalopods, discussion topics, teacher support, and student assessment techniques. These provide an accurate and inquiry base-learning environment for students to learn basic biological concepts using cephalopods as the subject organism by giving them a dynamic Web page that is updated, as new information is made available.
{"title":"CephSchool: a pedagogic portal for teaching biological principles with cephalopod molluscs","authors":"J. Wood, Caitlin M. H. Shaw","doi":"10.1109/JCDL.2003.1204920","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204920","url":null,"abstract":"CephSchool is based on CephBase and takes the information present in CephBase's digital libraries and redirects it towards students and teachers. CephSchool is organized into eight arms and contains information about cephalopods, discussion topics, teacher support, and student assessment techniques. These provide an accurate and inquiry base-learning environment for students to learn basic biological concepts using cephalopods as the subject organism by giving them a dynamic Web page that is updated, as new information is made available.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127596015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-27DOI: 10.1109/JCDL.2003.1204856
K. McKeown, Noémie Elhadad, V. Hatzivassiloglou
Despite the large amount of online medical literature, it can be difficult for clinicians to find relevant information at the point of patient care. We present techniques to personalize the results of search, making use of the online patient record as a sophisticated, preexisting user model. Our work in PERSIVAL, a medical digital library, includes methods for reranking the results of search to prioritize those that better match the patient record. It also generates summaries of the reranked results, which highlight information that is relevant to the patient under the physician's care. We focus on the use of a common representation for the articles returned by search and the patient record, which facilitates both the reranking and the summarization tasks. This common approach to both tasks has a strong positive effect on the ability to personalize information.
{"title":"Leveraging a common representation for personalized search and summarization in a medical digital library","authors":"K. McKeown, Noémie Elhadad, V. Hatzivassiloglou","doi":"10.1109/JCDL.2003.1204856","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204856","url":null,"abstract":"Despite the large amount of online medical literature, it can be difficult for clinicians to find relevant information at the point of patient care. We present techniques to personalize the results of search, making use of the online patient record as a sophisticated, preexisting user model. Our work in PERSIVAL, a medical digital library, includes methods for reranking the results of search to prioritize those that better match the patient record. It also generates summaries of the reranked results, which highlight information that is relevant to the patient under the physician's care. We focus on the use of a common representation for the articles returned by search and the patient record, which facilitates both the reranking and the summarization tasks. This common approach to both tasks has a strong positive effect on the ability to personalize information.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"421 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133638941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-27DOI: 10.1109/JCDL.2003.1204917
E. Liddy, Eileen Allen, Christina M. Finneran, Geri Gay, H. Hembrooke, Laura A. Granka
We are studying metadata from its initial generation to its use in accessing desired educational resources. With a testbed of lesson plans and activities, we are comparing the manually and automatically generated metadata for their retrieval effectiveness (i.e. ability to retrieve the most relevant resources); conducting a subjective evaluation of manually and automatically generated metadata as representations of the resource as judged by subject matter experts, and; conducting studies of users' search and navigation behavior when accessing the digital library. These evaluations successfully combine what we believe are necessary foci on how and whether metadata affects the user and system performance.
{"title":"MetaTest: evaluation of metadata from generation to use","authors":"E. Liddy, Eileen Allen, Christina M. Finneran, Geri Gay, H. Hembrooke, Laura A. Granka","doi":"10.1109/JCDL.2003.1204917","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204917","url":null,"abstract":"We are studying metadata from its initial generation to its use in accessing desired educational resources. With a testbed of lesson plans and activities, we are comparing the manually and automatically generated metadata for their retrieval effectiveness (i.e. ability to retrieve the most relevant resources); conducting a subjective evaluation of manually and automatically generated metadata as representations of the resource as judged by subject matter experts, and; conducting studies of users' search and navigation behavior when accessing the digital library. These evaluations successfully combine what we believe are necessary foci on how and whether metadata affects the user and system performance.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131762553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-27DOI: 10.1109/JCDL.2003.1204886
D. Castelli, P. Pagano
Expandability is one of the main requirements of future digital libraries. We introduce a digital library service system, OpenDLib, that has been designed to be highly expandable in terms of content, services and usage. We illustrate the mechanisms that enable expandability and discuss their impact on the development of the system architecture.
{"title":"A system for building expandable digital libraries","authors":"D. Castelli, P. Pagano","doi":"10.1109/JCDL.2003.1204886","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204886","url":null,"abstract":"Expandability is one of the main requirements of future digital libraries. We introduce a digital library service system, OpenDLib, that has been designed to be highly expandable in terms of content, services and usage. We illustrate the mechanisms that enable expandability and discuss their impact on the development of the system architecture.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133975659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-27DOI: 10.1109/JCDL.2003.1204889
M. Droettboom
We present a new technique for dealing with broken characters, one of the major challenges in the optical character recognition (OCR) of degraded historical printed documents. A technique based on graph combinatorics is used to rejoin the appropriate connected components. It has been applied to real data with successful results.
{"title":"Correcting broken characters in the recognition of historical printed documents","authors":"M. Droettboom","doi":"10.1109/JCDL.2003.1204889","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204889","url":null,"abstract":"We present a new technique for dealing with broken characters, one of the major challenges in the optical character recognition (OCR) of degraded historical printed documents. A technique based on graph combinatorics is used to rejoin the appropriate connected components. It has been applied to real data with successful results.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121155043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-27DOI: 10.1109/JCDL.2003.1204848
Yueyu Fu, Javed Mostafa, Kazuhiro Seki
Protein association discovery can directly contribute toward developing protein pathways; hence it is a significant problem in bioinformatics. LUCAS (Library of User-Oriented Concepts for Access Services) was designed to automatically extract and determine associations among proteins from biomedical literature. Such a tool has notable potential to automate database construction in biomedicine, instead of relying on experts' analysis. We report on the mechanisms for automatically generating clusters of proteins. A formal evaluation of the system, based on a subset of 2000 MEDLINE titles and abstracts, has been conducted against Swiss-Prot database in which the associations among concepts are entered by experts manually.
{"title":"Protein association discovery in biomedical literature","authors":"Yueyu Fu, Javed Mostafa, Kazuhiro Seki","doi":"10.1109/JCDL.2003.1204848","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204848","url":null,"abstract":"Protein association discovery can directly contribute toward developing protein pathways; hence it is a significant problem in bioinformatics. LUCAS (Library of User-Oriented Concepts for Access Services) was designed to automatically extract and determine associations among proteins from biomedical literature. Such a tool has notable potential to automate database construction in biomedicine, instead of relying on experts' analysis. We report on the mechanisms for automatically generating clusters of proteins. A formal evaluation of the system, based on a subset of 2000 MEDLINE titles and abstracts, has been conducted against Swiss-Prot database in which the associations among concepts are entered by experts manually.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125876027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-27DOI: 10.1109/JCDL.2003.1204919
Ji-Hoon Kang, Chul-Soo Kim, Eun-Jeong Ko
A standard query language is very helpful for interoperability among digital library systems over the Internet. We propose an XQuery engine that can be used as an XQuery processing module in a digital library system that supports XML documents. We assume generic digital library system architecture. It consists of four modules: a user interface, an XQuery engine, an information retrieval engine, and an XML repository. The XQuery engine parses an input XQuery and constructs a syntax tree for the query.
{"title":"An XQuery engine for digital library systems","authors":"Ji-Hoon Kang, Chul-Soo Kim, Eun-Jeong Ko","doi":"10.1109/JCDL.2003.1204919","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204919","url":null,"abstract":"A standard query language is very helpful for interoperability among digital library systems over the Internet. We propose an XQuery engine that can be used as an XQuery processing module in a digital library system that supports XML documents. We assume generic digital library system architecture. It consists of four modules: a user interface, an XQuery engine, an information retrieval engine, and an XML repository. The XQuery engine parses an input XQuery and constructs a syntax tree for the query.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124981310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-27DOI: 10.1109/JCDL.2003.1204842
Hui Han, C. Lee Giles, Eren Manavoglu, H. Zha, Zhenyue Zhang, E. Fox
Automatic metadata generation provides scalability and usability for digital libraries and their collections. Machine learning methods offer robust and adaptable automatic metadata extraction. We describe a support vector machine classification-based method for metadata extraction from header part of research papers and show that it outperforms other machine learning methods on the same task. The method first classifies each line of the header into one or more of 15 classes. An iterative convergence procedure is then used to improve the line classification by using the predicted class labels of its neighbor lines in the previous round. Further metadata extraction is done by seeking the best chunk boundaries of each line. We found that discovery and use of the structural patterns of the data and domain based word clustering can improve the metadata extraction performance. An appropriate feature normalization also greatly improves the classification performance. Our metadata extraction method was originally designed to improve the metadata extraction quality of the digital libraries Citeseer [S. Lawrence et al., (1999)] and EbizSearch [Y. Petinot et al., (2003)]. We believe it can be generalized to other digital libraries.
自动生成元数据为数字图书馆及其馆藏提供了可伸缩性和可用性。机器学习方法提供鲁棒性和适应性强的自动元数据提取。我们描述了一种基于支持向量机分类的方法,用于从研究论文的标题部分提取元数据,并表明它在相同的任务上优于其他机器学习方法。该方法首先将标题的每行分类为15个类中的一个或多个。然后使用迭代收敛过程,利用前一轮预测的相邻线的类别标签来改进线的分类。进一步的元数据提取是通过寻找每行的最佳块边界来完成的。我们发现发现和使用数据的结构模式和基于领域的词聚类可以提高元数据提取的性能。适当的特征归一化也可以大大提高分类性能。我们的元数据提取方法最初是为了提高数字图书馆Citeseer [S]的元数据提取质量而设计的。劳伦斯等人,(1999)]和EbizSearch [j]。Petinot et al.,(2003)]。我们相信它可以推广到其他数字图书馆。
{"title":"Automatic document metadata extraction using support vector machines","authors":"Hui Han, C. Lee Giles, Eren Manavoglu, H. Zha, Zhenyue Zhang, E. Fox","doi":"10.1109/JCDL.2003.1204842","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204842","url":null,"abstract":"Automatic metadata generation provides scalability and usability for digital libraries and their collections. Machine learning methods offer robust and adaptable automatic metadata extraction. We describe a support vector machine classification-based method for metadata extraction from header part of research papers and show that it outperforms other machine learning methods on the same task. The method first classifies each line of the header into one or more of 15 classes. An iterative convergence procedure is then used to improve the line classification by using the predicted class labels of its neighbor lines in the previous round. Further metadata extraction is done by seeking the best chunk boundaries of each line. We found that discovery and use of the structural patterns of the data and domain based word clustering can improve the metadata extraction performance. An appropriate feature normalization also greatly improves the classification performance. Our metadata extraction method was originally designed to improve the metadata extraction quality of the digital libraries Citeseer [S. Lawrence et al., (1999)] and EbizSearch [Y. Petinot et al., (2003)]. We believe it can be generalized to other digital libraries.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"125 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120853091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-27DOI: 10.1109/JCDL.2003.1204862
Yi-Chun Chu, I. Witten, R. Lobb, D. Bainbridge
Can digital libraries provide a reading experience that more closely resembles a real book than a scrolled or paginated electronic display? We describe a prototype page turning system that realistically animates full three-dimensional page-turns. The dynamic behavior is generated by a mass-spring model defined on a rectangular grid of particles. The prototype takes a PDF or e-book file, renders it into a sequence of PNG images representing individual pages, and animates the page-turns under user control. The simulation behaves fairly naturally, although more computer graphics work is required to perfect it.
{"title":"How to turn the page [digital libraries]","authors":"Yi-Chun Chu, I. Witten, R. Lobb, D. Bainbridge","doi":"10.1109/JCDL.2003.1204862","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204862","url":null,"abstract":"Can digital libraries provide a reading experience that more closely resembles a real book than a scrolled or paginated electronic display? We describe a prototype page turning system that realistically animates full three-dimensional page-turns. The dynamic behavior is generated by a mass-spring model defined on a rectangular grid of particles. The prototype takes a PDF or e-book file, renders it into a sequence of PNG images representing individual pages, and animates the page-turns under user control. The simulation behaves fairly naturally, although more computer graphics work is required to perfect it.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129930433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}