Journal of the American Society for Information Science and Technology最新文献

英文中文

The Theoretical Foundation of Zipf's Law and Its Application to the Bibliographic Database Environment 齐夫定律的理论基础及其在书目数据库环境中的应用

Journal of the American Society for Information Science and Technology

Pub Date : 2007-09-06 DOI: 10.1002/asi.4630330507

J. Fedorowicz

What does the frequency of occurrence of different words in an article have to do with the number of times an article is cited? Or, for that matter, with the number of publications an author has? All of these—word frequency, citation frequency, and publication frequencyobey an ubiquitous distribution called Zipf's law. Zipf's law applies as well to such diverse subjects as income distribution, firm size, and biological genera and species. Zipf in 1949 described a hyperbolic rank-frequency word distribution, which he fitted to a number of texts. He stated that if all unique words in a text are arranged (or ranked) in order of decreasing frequency of occurrence, the product of frequency times rank yields a constant which is approximately equal for all words in a text. The law has been shown to encompass many natural phenomena, and is equivalent to the distributions of Yule, Lotka, Pareto, Bradford, and Price. An ubiquitous empirical regularity suggests some universal principal. This article examines a number of theoretical derivations of the law, in order to show the relationship between the many attempts at ascertaining a theoretical justification for the phenomenon. We then briefly examine some of the ramifications of applying the law to the bibliographic database environment. The structure of the Zipf distribution resembles that of many other distributions, such as the Yule and Bradford distributions, and Lotka's law. Each has been observed as an empirical regularity in the study of many diverse subjects, ranging from the frequency of citation of published works to the distribution of the length of rugged coastline. What are the relationships among these phenomena? More importantly, how can one theoretically justify the existence of these regularities? This article is devoted to an explication of the appropriateness of the Zipf distribution to the word-frequency relation

不同单词在文章中出现的频率与文章被引用的次数有什么关系?或者，就此而言，与作者的出版物数量有关?所有这些——词频、引用频率和发表频率——都遵循一种普遍存在的分布，称为齐夫定律。齐夫定律同样适用于收入分配、公司规模、生物属和物种等多种主题。1949年，齐夫描述了一个双曲的等级-频率词分布，并将其拟合到许多文本中。他指出，如果文本中所有唯一的单词都按照出现频率递减的顺序排列(或排名)，那么频率乘以排名的乘积会产生一个常数，该常数对于文本中所有单词都近似相等。该定律已被证明包含了许多自然现象，相当于Yule, Lotka, Pareto, Bradford和Price的分布。无所不在的经验规律暗示着某种普遍原则。本文考察了该定律的一些理论推导，以显示为确定该现象的理论正当性而进行的许多尝试之间的关系。然后，我们简要地考察了将法律应用于书目数据库环境的一些后果。Zipf分布的结构类似于许多其他分布，如Yule分布和Bradford分布，以及Lotka定律。在许多不同学科的研究中，从出版作品的引用频率到崎岖海岸线的长度分布，每一个都被观察到是一种经验规律。这些现象之间的关系是什么?更重要的是，如何从理论上证明这些规律的存在?本文致力于解释Zipf分布对词频关系的适当性

{"title":"The Theoretical Foundation of Zipf's Law and Its Application to the Bibliographic Database Environment","authors":"J. Fedorowicz","doi":"10.1002/asi.4630330507","DOIUrl":"https://doi.org/10.1002/asi.4630330507","url":null,"abstract":"What does the frequency of occurrence of different words in an article have to do with the number of times an article is cited? Or, for that matter, with the number of publications an author has? All of these—word frequency, citation frequency, and publication frequencyobey an ubiquitous distribution called Zipf's law. Zipf's law applies as well to such diverse subjects as income distribution, firm size, and biological genera and species. Zipf in 1949 described a hyperbolic rank-frequency word distribution, which he fitted to a number of texts. He stated that if all unique words in a text are arranged (or ranked) in order of decreasing frequency of occurrence, the product of frequency times rank yields a constant which is approximately equal for all words in a text. The law has been shown to encompass many natural phenomena, and is equivalent to the distributions of Yule, Lotka, Pareto, Bradford, and Price. An ubiquitous empirical regularity suggests some universal principal. This article examines a number of theoretical derivations of the law, in order to show the relationship between the many attempts at ascertaining a theoretical justification for the phenomenon. We then briefly examine some of the ramifications of applying the law to the bibliographic database environment. The structure of the Zipf distribution resembles that of many other distributions, such as the Yule and Bradford distributions, and Lotka's law. Each has been observed as an empirical regularity in the study of many diverse subjects, ranging from the frequency of citation of published works to the distribution of the length of rugged coastline. What are the relationships among these phenomena? More importantly, how can one theoretically justify the existence of these regularities? This article is devoted to an explication of the appropriateness of the Zipf distribution to the word-frequency relation","PeriodicalId":50013,"journal":{"name":"Journal of the American Society for Information Science and Technology","volume":"28 1","pages":"285-293"},"PeriodicalIF":0.0,"publicationDate":"2007-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83741076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Opinion Paper: Euronet and its Effects on the U.S. Information Market 观点论文:Euronet及其对美国信息市场的影响

Journal of the American Society for Information Science and Technology

Pub Date : 2007-09-06 DOI: 10.1002/asi.4630300102

E. Brenner

The startup of Euronet in 1979 may eventually result in increased hourly charges to U.S. on-line users. This would arise from a drain-off of European business; access for Europeans to U.S. data processors has been made artificially expensive by the telecommunication charge established by the Postal and Telecommunications Administrations of the European Economic Community countries. Since the major data bases on the Euronet system will be U.S. produced, a plea is made for concerted effort by U.S. data producers and users to assure competitive communication rates in Europe.

1979年成立的Euronet可能最终导致美国在线用户每小时收费的增加。这将源于欧洲企业的流失;欧洲人使用美国数据处理器的费用被欧洲经济共同体(European Economic Community)国家的邮政和电信管理局(Postal and Telecommunications administration)人为地抬高了。由于Euronet系统的主要数据库将由美国生产，因此呼吁美国数据生产商和用户共同努力，以确保欧洲具有竞争力的通信速率。

引用次数: 1

Air Pollution Technical Information Network: A Revised Approach 空气污染技术信息网:修订方法

Journal of the American Society for Information Science and Technology

Pub Date : 2007-09-06 DOI: 10.1002/asi.4630300512

Peter Halpin, J. Knight

Technical information services which match the needs of the air pollution control community are now being provided. EPA studied alternatives for one month before settling on this new approach which relies on a multiplicity of existing technical information services. A file of more than 82, 000 abstracts was transferred from EPA's computer and information retrieval system in Research Triangle Park, NC, to those of Lockheed in Palo Alto, CA. That file is being built at the new rate of 2500–4000 abstracts per year. EPA may resume publication of an abstract bulletin from that file and other files which contain the needed information. A quarterly catalog of all EPA reports, including those on air pollution, is for sale at NTIS. EPA is sponsoring literature searches from multiple files for an exclusive clientele. EPA's offices may make the abstracted documents accessible at their locations nationwide. Both personnel reductions and cost savings were achieved. The new computer and information retrieval system is satisfactory.

我们现正提供符合空气污染管制界需要的技术资料服务。环境保护署研究了一个月的替代方案，然后决定采用这种依赖于多种现有技术信息服务的新方法。超过82,000份摘要的文件从EPA位于北卡罗来纳州三角研究园的计算机和信息检索系统转移到位于加利福尼亚州帕洛阿尔托的洛克希德公司。该文件正在以每年2500-4000份摘要的新速度建立。EPA可以从该文件和其他包含所需信息的文件中恢复发布摘要公告。包括空气污染在内的所有EPA报告的季度目录在NTIS出售。EPA为一个独家客户赞助多个文件的文献搜索。EPA的办公室可以在全国范围内提供这些文件的摘要。既减少了人员，又节省了成本。新的计算机和信息检索系统令人满意。

引用次数: 0

Value-Added Processes in the Information Life Cycle 信息生命周期中的增值过程

Journal of the American Society for Information Science and Technology

Pub Date : 2007-09-06 DOI: 10.1002/asi.4630330517

Robert S. Taylor

Information systems are a series of formal processes by which the potential usefulness of a specific message being processed is enhanced, i.e., value is added. Energy, time, and money must be invested to change useless data to productive knowledge, a value-added process. Because ultimately usefulness, i.e., the determination of value, must rest with the user, it is necessary to describe the environments from which problems arise which require information for resolution. From an understanding of these environments, we can develop a better sensitivity to the users' perceptions of their benefits and costs as they use information systems. The aim of this article is to develop a different way of looking at information systems in which the information use becomes the prime design factor rather than technology and content.

信息系统是一系列正式的过程，通过这些过程，正在处理的特定消息的潜在有用性得到增强，即增加价值。必须投入精力、时间和金钱，将无用的数据转化为有价值的知识，这是一个增值过程。因为最终的有用性，即价值的确定，必须取决于用户，所以有必要描述产生问题的环境，这些问题需要信息来解决。通过对这些环境的理解，我们可以更好地了解用户在使用信息系统时对其收益和成本的看法。本文的目的是开发一种不同的方式来看待信息系统，其中信息使用成为主要的设计因素，而不是技术和内容。

引用次数: 107

Associative Search Techniques versus Probabilistic Retrieval Models 关联搜索技术与概率检索模型

Journal of the American Society for Information Science and Technology

Pub Date : 2007-09-06 DOI: 10.1002/asi.4630330510

M. Maron

This article offers a personal look back at the origins and early use of associative search techniques, and also a look forward at more theoretical approaches to the document retrieval problems. The purpose is to contrast the following two different ways of improving system performance: (1) appending associative search techniques to more or less standard (conventional) document retrieval systems, and (2) designing document retrieval systems based on more fundamental and appropriate principles namely probabilistic design principles. Very recent work on probabilistic approaches to the document retrieval problem has provided a new (and rare) unification of two previously competing models. In light of this, I argue that if we had to choose the best way to improve performance of a document retrieval system, it would be wiser to implement, test, and evaluate this new unified model, rather than to continue to use associative techniques which are coupled to conventionally designed retrieval systems.

本文回顾了关联搜索技术的起源和早期使用，并展望了解决文档检索问题的更多理论方法。目的是对比以下两种不同的提高系统性能的方法:(1)将关联搜索技术附加到或多或少标准的(传统的)文档检索系统中;(2)基于更基本和适当的原则(即概率设计原则)设计文档检索系统。最近关于文档检索问题的概率方法的工作提供了一种新的(并且罕见的)统一两个先前竞争的模型。鉴于此，我认为，如果我们必须选择最好的方法来提高文档检索系统的性能，那么实现、测试和评估这个新的统一模型将是更明智的选择，而不是继续使用与传统设计的检索系统相结合的关联技术。

引用次数: 9

Automatic Classification of Harris Survey Questions: An Experiment in the Organization of Information 哈里斯调查问题的自动分类:信息组织实验

Journal of the American Society for Information Science and Technology

Pub Date : 2007-09-06 DOI: 10.1002/asi.4630330508

M. Dillon

This article describes an ongoing project in the automatic classification of Louis Harris survey questions. The purpose of the project is to explore the feasibility of automatically organizing the questions in a form that can be usefully presented to researchers and to investigate the problems associated with automatic classification of natural language text. In general, the experiment reported here supports the belief that an important application of automatic classification is to transform an intractable mass of data into a structure suitable for further human processing.

本文描述了Louis Harris调查问题自动分类的一个正在进行的项目。该项目的目的是探索自动组织问题的可行性，以一种可以有效地呈现给研究人员的形式，并研究与自然语言文本自动分类相关的问题。总的来说，这里报道的实验支持这样一种观点，即自动分类的一个重要应用是将难以处理的大量数据转换为适合进一步人类处理的结构。

引用次数: 3

A Reliability Theoretic Construct for Assessing Information Flow in Networks 一种评估网络信息流的可靠性理论结构

Journal of the American Society for Information Science and Technology

Pub Date : 2007-09-06 DOI: 10.1002/asi.4630300106

P. Sullo, W. Wallace, T. Triscari, Cathy A. Chazen, James F. Davis

A reliability theoretic construct is proposed for conceptualizing the process of information flow. It focuses on information produced to satisfy specified purposes or to achieve preconceived objectives. Furthermore, the model incorporates explicitly the concept of an information producer contemplating a choice of action in an uncertain environment. The resulting models are therefore prescriptive in nature. The usefulness of this construct is illustrated by a case analysis of the effectiveness of natural resource data products in land-use decision making. Measures of system reliability of the information flow network are determined and sensitivity analyses performed. Numerical examples are presented and discussed. The prescriptive nature of this approach permits use of its results to indicate how a data producer can increase the effectiveness of documents by identifying the information flow network, assessing the reliability of each component in the network, finding measures of system reliability, and performing sensitivity analyses to identify the critical components of the system. The result is a closer congruence between the objectives of the data producer and the requirements of users.

为了将信息流过程概念化，提出了一种可靠性理论结构。它侧重于为满足特定目的或实现预先设想的目标而产生的信息。此外，该模型明确地结合了信息生产者在不确定环境中考虑行动选择的概念。由此产生的模型在本质上是规定性的。自然资源数据产品在土地使用决策中的有效性的案例分析说明了这种结构的有用性。确定了信息流网络的系统可靠性度量，并进行了敏感性分析。给出了数值算例并进行了讨论。这种方法的规定性允许使用其结果来指示数据生产者如何通过识别信息流网络、评估网络中每个组件的可靠性、找到系统可靠性的度量以及执行敏感性分析来识别系统的关键组件来提高文件的有效性。结果是数据生产者的目标和用户的要求之间更加一致。

{"title":"A Reliability Theoretic Construct for Assessing Information Flow in Networks","authors":"P. Sullo, W. Wallace, T. Triscari, Cathy A. Chazen, James F. Davis","doi":"10.1002/asi.4630300106","DOIUrl":"https://doi.org/10.1002/asi.4630300106","url":null,"abstract":"A reliability theoretic construct is proposed for conceptualizing the process of information flow. It focuses on information produced to satisfy specified purposes or to achieve preconceived objectives. Furthermore, the model incorporates explicitly the concept of an information producer contemplating a choice of action in an uncertain environment. The resulting models are therefore prescriptive in nature. The usefulness of this construct is illustrated by a case analysis of the effectiveness of natural resource data products in land-use decision making. Measures of system reliability of the information flow network are determined and sensitivity analyses performed. Numerical examples are presented and discussed. The prescriptive nature of this approach permits use of its results to indicate how a data producer can increase the effectiveness of documents by identifying the information flow network, assessing the reliability of each component in the network, finding measures of system reliability, and performing sensitivity analyses to identify the critical components of the system. The result is a closer congruence between the objectives of the data producer and the requirements of users.","PeriodicalId":50013,"journal":{"name":"Journal of the American Society for Information Science and Technology","volume":"107 1","pages":"25-32"},"PeriodicalIF":0.0,"publicationDate":"2007-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74199029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Man, Information, and Society: New Patterns of Interaction 人、信息与社会:互动的新模式

Journal of the American Society for Information Science and Technology

Pub Date : 2007-09-06 DOI: 10.1002/asi.4630300104

S. Artandi

Characteristic of today's socially, economically, and technologically complex society is an ever-growing information output coupled with a constantly increasing reliance on information. Information is considered a resource and there is recognition of the political and economic value of information. Superimposed on the new information environment is a sophisticated information technology raising a variety of issues relating to such things as the freedom and privacy of the individual, its effect on those who interact with it, and the new social stratification system it seems to imply.

当今社会，经济和技术复杂的社会的特点是不断增长的信息输出加上对信息的不断增加的依赖。信息被认为是一种资源，人们认识到信息的政治和经济价值。叠加在新的信息环境上的是一种复杂的信息技术，它提出了各种各样的问题，如个人的自由和隐私，它对与之互动的人的影响，以及它似乎暗示的新的社会分层系统。

引用次数: 12

Planning and Budgeting for School Media Programs at the Building, District, and Regional Levels: O.R. in the Little Red Schoolhouse 建筑、地区和区域层面的学校媒体计划和预算:小红校舍的手术室

Journal of the American Society for Information Science and Technology

Pub Date : 2007-09-06 DOI: 10.1002/asi.4630300108

D. Kraft, James W. Liesener

The problems of resource allocation in the school library are analyzed and a practical operations research (O.R.) approach towards accountability is presented. A discussion of the nine‐step solution procedure is given, including the use of four planning instruments: inventory of services, preference form, data collection guide, and program costing matrix. The use of cost‐benefit analysis is shown to be helpful in determining the “best” allocation strategy. There is a presentation of implementation suggestions, and examples of the use of the methodology in actual school situations are given. Extensions of the work from building level school library media programs to district (system) and regional level learning resource (media) programs are also presented.

分析了学校图书馆资源配置中存在的问题，提出了一种实用的运筹学问责方法。给出了九步解决程序的讨论，包括使用四种规划工具:服务清单、偏好表、数据收集指南和项目成本矩阵。使用成本效益分析被证明有助于确定“最佳”分配策略。书中介绍了实施建议，并给出了在实际学校情况下使用该方法的例子。本文还介绍了从建立学校图书馆媒体项目到地区(系统)和区域学习资源(媒体)项目的扩展工作。

引用次数: 0

An Evolutionary Approach in Information Systems Science 信息系统科学的进化方法

Journal of the American Society for Information Science and Technology

Pub Date : 2007-09-06 DOI: 10.1002/asi.4630330511

N. Stanoulov

This article is concerned with a layout of methodological apparatus revealing the logical transition of information theory from one kind to another, creating from the existing information theory (i.e., informal theory) (i) a near‐formal information‐theoretic axiomatic system and (ii) an information metatheory (only briefly sketched here). The most outstanding example manifesting such an evolution (“optimization”) in the methodology of natural sciences is the transition of mathematics to meta‐mathematics. The significance of this approach lies in defining the subject matter to be studied, as well as in explicating its feature, and possibly forecasting new ones. Based on this knowledge an attempt to reconstruct a known model of scientific evolution of S. Watanabe is made. The modification of this model is “case‐oriented,” i.e., it depends on the mentioned metatheoretical reasons. The evolution of the information systems science via such a dynamical tool as the metatheoretical approach is grounded most generally in the developed informal theory confronted with the human and social needs.

本文关注的是揭示信息论从一种到另一种的逻辑转换的方法论装置的布局，从现有的信息论(即非正式理论)中创建(i)一个接近正式的信息论公理系统和(ii)一个信息元理论(此处仅简要概述)。在自然科学方法论中表现出这种进化(“优化”)的最突出的例子是数学向元数学的过渡。这种方法的意义在于确定要研究的主题，以及解释其特征，并可能预测新的特征。在此基础上，试图重建渡边S.科学进化的已知模型。该模型的修正是“以案例为导向”的，即取决于上述元理论原因。信息系统科学通过元理论方法这一动态工具的演变，最普遍地是基于面对人类和社会需求的发达的非正式理论。

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of the American Society for Information Science and Technology

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀