What does the frequency of occurrence of different words in an article have to do with the number of times an article is cited? Or, for that matter, with the number of publications an author has? All of these—word frequency, citation frequency, and publication frequencyobey an ubiquitous distribution called Zipf's law. Zipf's law applies as well to such diverse subjects as income distribution, firm size, and biological genera and species. Zipf in 1949 described a hyperbolic rank-frequency word distribution, which he fitted to a number of texts. He stated that if all unique words in a text are arranged (or ranked) in order of decreasing frequency of occurrence, the product of frequency times rank yields a constant which is approximately equal for all words in a text. The law has been shown to encompass many natural phenomena, and is equivalent to the distributions of Yule, Lotka, Pareto, Bradford, and Price. An ubiquitous empirical regularity suggests some universal principal. This article examines a number of theoretical derivations of the law, in order to show the relationship between the many attempts at ascertaining a theoretical justification for the phenomenon. We then briefly examine some of the ramifications of applying the law to the bibliographic database environment. The structure of the Zipf distribution resembles that of many other distributions, such as the Yule and Bradford distributions, and Lotka's law. Each has been observed as an empirical regularity in the study of many diverse subjects, ranging from the frequency of citation of published works to the distribution of the length of rugged coastline. What are the relationships among these phenomena? More importantly, how can one theoretically justify the existence of these regularities? This article is devoted to an explication of the appropriateness of the Zipf distribution to the word-frequency relation
{"title":"The Theoretical Foundation of Zipf's Law and Its Application to the Bibliographic Database Environment","authors":"J. Fedorowicz","doi":"10.1002/asi.4630330507","DOIUrl":"https://doi.org/10.1002/asi.4630330507","url":null,"abstract":"What does the frequency of occurrence of different words in an article have to do with the number of times an article is cited? Or, for that matter, with the number of publications an author has? All of these—word frequency, citation frequency, and publication frequencyobey an ubiquitous distribution called Zipf's law. Zipf's law applies as well to such diverse subjects as income distribution, firm size, and biological genera and species. Zipf in 1949 described a hyperbolic rank-frequency word distribution, which he fitted to a number of texts. He stated that if all unique words in a text are arranged (or ranked) in order of decreasing frequency of occurrence, the product of frequency times rank yields a constant which is approximately equal for all words in a text. The law has been shown to encompass many natural phenomena, and is equivalent to the distributions of Yule, Lotka, Pareto, Bradford, and Price. An ubiquitous empirical regularity suggests some universal principal. This article examines a number of theoretical derivations of the law, in order to show the relationship between the many attempts at ascertaining a theoretical justification for the phenomenon. We then briefly examine some of the ramifications of applying the law to the bibliographic database environment. The structure of the Zipf distribution resembles that of many other distributions, such as the Yule and Bradford distributions, and Lotka's law. Each has been observed as an empirical regularity in the study of many diverse subjects, ranging from the frequency of citation of published works to the distribution of the length of rugged coastline. What are the relationships among these phenomena? More importantly, how can one theoretically justify the existence of these regularities? This article is devoted to an explication of the appropriateness of the Zipf distribution to the word-frequency relation","PeriodicalId":50013,"journal":{"name":"Journal of the American Society for Information Science and Technology","volume":"28 1","pages":"285-293"},"PeriodicalIF":0.0,"publicationDate":"2007-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83741076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The startup of Euronet in 1979 may eventually result in increased hourly charges to U.S. on-line users. This would arise from a drain-off of European business; access for Europeans to U.S. data processors has been made artificially expensive by the telecommunication charge established by the Postal and Telecommunications Administrations of the European Economic Community countries. Since the major data bases on the Euronet system will be U.S. produced, a plea is made for concerted effort by U.S. data producers and users to assure competitive communication rates in Europe.
1979年成立的Euronet可能最终导致美国在线用户每小时收费的增加。这将源于欧洲企业的流失;欧洲人使用美国数据处理器的费用被欧洲经济共同体(European Economic Community)国家的邮政和电信管理局(Postal and Telecommunications administration)人为地抬高了。由于Euronet系统的主要数据库将由美国生产,因此呼吁美国数据生产商和用户共同努力,以确保欧洲具有竞争力的通信速率。
{"title":"Opinion Paper: Euronet and its Effects on the U.S. Information Market","authors":"E. Brenner","doi":"10.1002/asi.4630300102","DOIUrl":"https://doi.org/10.1002/asi.4630300102","url":null,"abstract":"The startup of Euronet in 1979 may eventually result in increased hourly charges to U.S. on-line users. This would arise from a drain-off of European business; access for Europeans to U.S. data processors has been made artificially expensive by the telecommunication charge established by the Postal and Telecommunications Administrations of the European Economic Community countries. Since the major data bases on the Euronet system will be U.S. produced, a plea is made for concerted effort by U.S. data producers and users to assure competitive communication rates in Europe.","PeriodicalId":50013,"journal":{"name":"Journal of the American Society for Information Science and Technology","volume":"31 1","pages":"5-8"},"PeriodicalIF":0.0,"publicationDate":"2007-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73655697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Technical information services which match the needs of the air pollution control community are now being provided. EPA studied alternatives for one month before settling on this new approach which relies on a multiplicity of existing technical information services. A file of more than 82, 000 abstracts was transferred from EPA's computer and information retrieval system in Research Triangle Park, NC, to those of Lockheed in Palo Alto, CA. That file is being built at the new rate of 2500–4000 abstracts per year. EPA may resume publication of an abstract bulletin from that file and other files which contain the needed information. A quarterly catalog of all EPA reports, including those on air pollution, is for sale at NTIS. EPA is sponsoring literature searches from multiple files for an exclusive clientele. EPA's offices may make the abstracted documents accessible at their locations nationwide. Both personnel reductions and cost savings were achieved. The new computer and information retrieval system is satisfactory.
{"title":"Air Pollution Technical Information Network: A Revised Approach","authors":"Peter Halpin, J. Knight","doi":"10.1002/asi.4630300512","DOIUrl":"https://doi.org/10.1002/asi.4630300512","url":null,"abstract":"Technical information services which match the needs of the air pollution control community are now being provided. EPA studied alternatives for one month before settling on this new approach which relies on a multiplicity of existing technical information services. A file of more than 82, 000 abstracts was transferred from EPA's computer and information retrieval system in Research Triangle Park, NC, to those of Lockheed in Palo Alto, CA. That file is being built at the new rate of 2500–4000 abstracts per year. EPA may resume publication of an abstract bulletin from that file and other files which contain the needed information. A quarterly catalog of all EPA reports, including those on air pollution, is for sale at NTIS. EPA is sponsoring literature searches from multiple files for an exclusive clientele. EPA's offices may make the abstracted documents accessible at their locations nationwide. Both personnel reductions and cost savings were achieved. The new computer and information retrieval system is satisfactory.","PeriodicalId":50013,"journal":{"name":"Journal of the American Society for Information Science and Technology","volume":"8 3","pages":"315-316"},"PeriodicalIF":0.0,"publicationDate":"2007-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72578612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information systems are a series of formal processes by which the potential usefulness of a specific message being processed is enhanced, i.e., value is added. Energy, time, and money must be invested to change useless data to productive knowledge, a value-added process. Because ultimately usefulness, i.e., the determination of value, must rest with the user, it is necessary to describe the environments from which problems arise which require information for resolution. From an understanding of these environments, we can develop a better sensitivity to the users' perceptions of their benefits and costs as they use information systems. The aim of this article is to develop a different way of looking at information systems in which the information use becomes the prime design factor rather than technology and content.
{"title":"Value-Added Processes in the Information Life Cycle","authors":"Robert S. Taylor","doi":"10.1002/asi.4630330517","DOIUrl":"https://doi.org/10.1002/asi.4630330517","url":null,"abstract":"Information systems are a series of formal processes by which the potential usefulness of a specific message being processed is enhanced, i.e., value is added. Energy, time, and money must be invested to change useless data to productive knowledge, a value-added process. Because ultimately usefulness, i.e., the determination of value, must rest with the user, it is necessary to describe the environments from which problems arise which require information for resolution. From an understanding of these environments, we can develop a better sensitivity to the users' perceptions of their benefits and costs as they use information systems. The aim of this article is to develop a different way of looking at information systems in which the information use becomes the prime design factor rather than technology and content.","PeriodicalId":50013,"journal":{"name":"Journal of the American Society for Information Science and Technology","volume":"10 1","pages":"341-346"},"PeriodicalIF":0.0,"publicationDate":"2007-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82054014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article offers a personal look back at the origins and early use of associative search techniques, and also a look forward at more theoretical approaches to the document retrieval problems. The purpose is to contrast the following two different ways of improving system performance: (1) appending associative search techniques to more or less standard (conventional) document retrieval systems, and (2) designing document retrieval systems based on more fundamental and appropriate principles namely probabilistic design principles. Very recent work on probabilistic approaches to the document retrieval problem has provided a new (and rare) unification of two previously competing models. In light of this, I argue that if we had to choose the best way to improve performance of a document retrieval system, it would be wiser to implement, test, and evaluate this new unified model, rather than to continue to use associative techniques which are coupled to conventionally designed retrieval systems.
{"title":"Associative Search Techniques versus Probabilistic Retrieval Models","authors":"M. Maron","doi":"10.1002/asi.4630330510","DOIUrl":"https://doi.org/10.1002/asi.4630330510","url":null,"abstract":"This article offers a personal look back at the origins and early use of associative search techniques, and also a look forward at more theoretical approaches to the document retrieval problems. The purpose is to contrast the following two different ways of improving system performance: (1) appending associative search techniques to more or less standard (conventional) document retrieval systems, and (2) designing document retrieval systems based on more fundamental and appropriate principles namely probabilistic design principles. Very recent work on probabilistic approaches to the document retrieval problem has provided a new (and rare) unification of two previously competing models. In light of this, I argue that if we had to choose the best way to improve performance of a document retrieval system, it would be wiser to implement, test, and evaluate this new unified model, rather than to continue to use associative techniques which are coupled to conventionally designed retrieval systems.","PeriodicalId":50013,"journal":{"name":"Journal of the American Society for Information Science and Technology","volume":"56 1","pages":"308-310"},"PeriodicalIF":0.0,"publicationDate":"2007-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86943433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article describes an ongoing project in the automatic classification of Louis Harris survey questions. The purpose of the project is to explore the feasibility of automatically organizing the questions in a form that can be usefully presented to researchers and to investigate the problems associated with automatic classification of natural language text. In general, the experiment reported here supports the belief that an important application of automatic classification is to transform an intractable mass of data into a structure suitable for further human processing.
{"title":"Automatic Classification of Harris Survey Questions: An Experiment in the Organization of Information","authors":"M. Dillon","doi":"10.1002/asi.4630330508","DOIUrl":"https://doi.org/10.1002/asi.4630330508","url":null,"abstract":"This article describes an ongoing project in the automatic classification of Louis Harris survey questions. The purpose of the project is to explore the feasibility of automatically organizing the questions in a form that can be usefully presented to researchers and to investigate the problems associated with automatic classification of natural language text. In general, the experiment reported here supports the belief that an important application of automatic classification is to transform an intractable mass of data into a structure suitable for further human processing.","PeriodicalId":50013,"journal":{"name":"Journal of the American Society for Information Science and Technology","volume":"44 1","pages":"294-301"},"PeriodicalIF":0.0,"publicationDate":"2007-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86445171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Sullo, W. Wallace, T. Triscari, Cathy A. Chazen, James F. Davis
A reliability theoretic construct is proposed for conceptualizing the process of information flow. It focuses on information produced to satisfy specified purposes or to achieve preconceived objectives. Furthermore, the model incorporates explicitly the concept of an information producer contemplating a choice of action in an uncertain environment. The resulting models are therefore prescriptive in nature. The usefulness of this construct is illustrated by a case analysis of the effectiveness of natural resource data products in land-use decision making. Measures of system reliability of the information flow network are determined and sensitivity analyses performed. Numerical examples are presented and discussed. The prescriptive nature of this approach permits use of its results to indicate how a data producer can increase the effectiveness of documents by identifying the information flow network, assessing the reliability of each component in the network, finding measures of system reliability, and performing sensitivity analyses to identify the critical components of the system. The result is a closer congruence between the objectives of the data producer and the requirements of users.
{"title":"A Reliability Theoretic Construct for Assessing Information Flow in Networks","authors":"P. Sullo, W. Wallace, T. Triscari, Cathy A. Chazen, James F. Davis","doi":"10.1002/asi.4630300106","DOIUrl":"https://doi.org/10.1002/asi.4630300106","url":null,"abstract":"A reliability theoretic construct is proposed for conceptualizing the process of information flow. It focuses on information produced to satisfy specified purposes or to achieve preconceived objectives. Furthermore, the model incorporates explicitly the concept of an information producer contemplating a choice of action in an uncertain environment. The resulting models are therefore prescriptive in nature. The usefulness of this construct is illustrated by a case analysis of the effectiveness of natural resource data products in land-use decision making. Measures of system reliability of the information flow network are determined and sensitivity analyses performed. Numerical examples are presented and discussed. The prescriptive nature of this approach permits use of its results to indicate how a data producer can increase the effectiveness of documents by identifying the information flow network, assessing the reliability of each component in the network, finding measures of system reliability, and performing sensitivity analyses to identify the critical components of the system. The result is a closer congruence between the objectives of the data producer and the requirements of users.","PeriodicalId":50013,"journal":{"name":"Journal of the American Society for Information Science and Technology","volume":"107 1","pages":"25-32"},"PeriodicalIF":0.0,"publicationDate":"2007-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74199029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Characteristic of today's socially, economically, and technologically complex society is an ever-growing information output coupled with a constantly increasing reliance on information. Information is considered a resource and there is recognition of the political and economic value of information. Superimposed on the new information environment is a sophisticated information technology raising a variety of issues relating to such things as the freedom and privacy of the individual, its effect on those who interact with it, and the new social stratification system it seems to imply.
{"title":"Man, Information, and Society: New Patterns of Interaction","authors":"S. Artandi","doi":"10.1002/asi.4630300104","DOIUrl":"https://doi.org/10.1002/asi.4630300104","url":null,"abstract":"Characteristic of today's socially, economically, and technologically complex society is an ever-growing information output coupled with a constantly increasing reliance on information. Information is considered a resource and there is recognition of the political and economic value of information. Superimposed on the new information environment is a sophisticated information technology raising a variety of issues relating to such things as the freedom and privacy of the individual, its effect on those who interact with it, and the new social stratification system it seems to imply.","PeriodicalId":50013,"journal":{"name":"Journal of the American Society for Information Science and Technology","volume":"236 1 1","pages":"15-18"},"PeriodicalIF":0.0,"publicationDate":"2007-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72941735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The problems of resource allocation in the school library are analyzed and a practical operations research (O.R.) approach towards accountability is presented. A discussion of the nine‐step solution procedure is given, including the use of four planning instruments: inventory of services, preference form, data collection guide, and program costing matrix. The use of cost‐benefit analysis is shown to be helpful in determining the “best” allocation strategy. There is a presentation of implementation suggestions, and examples of the use of the methodology in actual school situations are given. Extensions of the work from building level school library media programs to district (system) and regional level learning resource (media) programs are also presented.
{"title":"Planning and Budgeting for School Media Programs at the Building, District, and Regional Levels: O.R. in the Little Red Schoolhouse","authors":"D. Kraft, James W. Liesener","doi":"10.1002/asi.4630300108","DOIUrl":"https://doi.org/10.1002/asi.4630300108","url":null,"abstract":"The problems of resource allocation in the school library are analyzed and a practical operations research (O.R.) approach towards accountability is presented. A discussion of the nine‐step solution procedure is given, including the use of four planning instruments: inventory of services, preference form, data collection guide, and program costing matrix. The use of cost‐benefit analysis is shown to be helpful in determining the “best” allocation strategy. There is a presentation of implementation suggestions, and examples of the use of the methodology in actual school situations are given. Extensions of the work from building level school library media programs to district (system) and regional level learning resource (media) programs are also presented.","PeriodicalId":50013,"journal":{"name":"Journal of the American Society for Information Science and Technology","volume":"1 1","pages":"41-50"},"PeriodicalIF":0.0,"publicationDate":"2007-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79031232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article is concerned with a layout of methodological apparatus revealing the logical transition of information theory from one kind to another, creating from the existing information theory (i.e., informal theory) (i) a near‐formal information‐theoretic axiomatic system and (ii) an information metatheory (only briefly sketched here). The most outstanding example manifesting such an evolution (“optimization”) in the methodology of natural sciences is the transition of mathematics to meta‐mathematics. The significance of this approach lies in defining the subject matter to be studied, as well as in explicating its feature, and possibly forecasting new ones. Based on this knowledge an attempt to reconstruct a known model of scientific evolution of S. Watanabe is made. The modification of this model is “case‐oriented,” i.e., it depends on the mentioned metatheoretical reasons. The evolution of the information systems science via such a dynamical tool as the metatheoretical approach is grounded most generally in the developed informal theory confronted with the human and social needs.
{"title":"An Evolutionary Approach in Information Systems Science","authors":"N. Stanoulov","doi":"10.1002/asi.4630330511","DOIUrl":"https://doi.org/10.1002/asi.4630330511","url":null,"abstract":"This article is concerned with a layout of methodological apparatus revealing the logical transition of information theory from one kind to another, creating from the existing information theory (i.e., informal theory) (i) a near‐formal information‐theoretic axiomatic system and (ii) an information metatheory (only briefly sketched here). The most outstanding example manifesting such an evolution (“optimization”) in the methodology of natural sciences is the transition of mathematics to meta‐mathematics. The significance of this approach lies in defining the subject matter to be studied, as well as in explicating its feature, and possibly forecasting new ones. Based on this knowledge an attempt to reconstruct a known model of scientific evolution of S. Watanabe is made. The modification of this model is “case‐oriented,” i.e., it depends on the mentioned metatheoretical reasons. The evolution of the information systems science via such a dynamical tool as the metatheoretical approach is grounded most generally in the developed informal theory confronted with the human and social needs.","PeriodicalId":50013,"journal":{"name":"Journal of the American Society for Information Science and Technology","volume":"1 1","pages":"311-316"},"PeriodicalIF":0.0,"publicationDate":"2007-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78009772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}