{"title":"在管理非结构化文档中使用感知","authors":"C. K. Cheng, Xiaoshan Pan","doi":"10.1145/1027328.1027333","DOIUrl":null,"url":null,"abstract":"Over the last ten years, the increased availability of documents in digital form has contributed significantly to the immense volume of knowledge and information available to computer users. The World Wide Web has become the largest digital library available, with more than one billion unique indexable web pages [12]. Yet, due to their dynamic nature, fast growth rate, and unstructured format, it is increasingly difficult to identify and retrieve valuable information from these documents. More importantly, the usefulness of an unstructured document is dependent upon the ease and efficiency with which the information is retrieved [3]. In this paper, we define an unstructured document as a \"general\" document that is without a specific format e.g., plain text. Whereas, a document divided into sections or paragraph tags is referred to as semi-structured e.g., a formatted text document or a web page.","PeriodicalId":429016,"journal":{"name":"ACM Crossroads","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Using perception in managing unstructured documents\",\"authors\":\"C. K. Cheng, Xiaoshan Pan\",\"doi\":\"10.1145/1027328.1027333\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Over the last ten years, the increased availability of documents in digital form has contributed significantly to the immense volume of knowledge and information available to computer users. The World Wide Web has become the largest digital library available, with more than one billion unique indexable web pages [12]. Yet, due to their dynamic nature, fast growth rate, and unstructured format, it is increasingly difficult to identify and retrieve valuable information from these documents. More importantly, the usefulness of an unstructured document is dependent upon the ease and efficiency with which the information is retrieved [3]. In this paper, we define an unstructured document as a \\\"general\\\" document that is without a specific format e.g., plain text. Whereas, a document divided into sections or paragraph tags is referred to as semi-structured e.g., a formatted text document or a web page.\",\"PeriodicalId\":429016,\"journal\":{\"name\":\"ACM Crossroads\",\"volume\":\"73 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Crossroads\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1027328.1027333\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Crossroads","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1027328.1027333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using perception in managing unstructured documents
Over the last ten years, the increased availability of documents in digital form has contributed significantly to the immense volume of knowledge and information available to computer users. The World Wide Web has become the largest digital library available, with more than one billion unique indexable web pages [12]. Yet, due to their dynamic nature, fast growth rate, and unstructured format, it is increasingly difficult to identify and retrieve valuable information from these documents. More importantly, the usefulness of an unstructured document is dependent upon the ease and efficiency with which the information is retrieved [3]. In this paper, we define an unstructured document as a "general" document that is without a specific format e.g., plain text. Whereas, a document divided into sections or paragraph tags is referred to as semi-structured e.g., a formatted text document or a web page.