ACM Trans. Manag. Inf. Syst.最新文献_第3页

Is Combining Contextual and Behavioral Targeting Strategies Effective in Online Advertising? 结合上下文和行为定位策略在网络广告中有效吗?

ACM Trans. Manag. Inf. Syst.

Pub Date : 2016-03-29 DOI: 10.1145/2883816

Xianghua Lu, Xia Zhao, Ling Xue

Online targeting has been increasingly used to deliver ads to consumers. But discovering how to target the most valuable web visitors and generate a high response rate is still a challenge for advertising intermediaries and advertisers. The purpose of this study is to examine how behavioral targeting (BT) impacts users’ responses to online ads and particularly whether BT works better in combination with contextual targeting (CT). Using a large, individual-level clickstream data set of an automobile advertising campaign from an Internet advertising intermediary, this study examines the impact of BT and CT strategies on users’ click behavior. The results show that (1) targeting a user with behavioral characteristics that are closely related to ads does not necessarily increase the click through rates (CTRs); whereas, targeting a user with behavioral characteristics that are loosely related to ads leads to a higher CTR, and (2) BT and CT work better in combination. Our study contributes to online advertising design literature and provides important managerial implications for advertising intermediaries and advertisers on targeting individual users.

在线定位越来越多地用于向消费者投放广告。但如何锁定最有价值的网络访问者并产生高回复率，对广告中介和广告商来说仍然是一个挑战。本研究的目的是研究行为定位(BT)如何影响用户对在线广告的反应，特别是BT与上下文定位(CT)结合是否更好。本研究使用来自互联网广告中介的大型个人层面的汽车广告活动点击流数据集，检验了BT和CT策略对用户点击行为的影响。结果表明:(1)针对具有与广告密切相关的行为特征的用户并不一定会提高点击率(CTRs);然而，针对具有与广告松散相关的行为特征的用户会导致更高的点击率，并且(2)BT和CT结合使用效果更好。我们的研究对网络广告设计文献有所贡献，并为广告中介和广告商针对个人用户提供了重要的管理启示。

引用次数: 18

Who's Next? Scheduling Personalization Services with Variable Service Times 下一个是谁?可变服务时间调度个性化服务

ACM Trans. Manag. Inf. Syst.

Pub Date : 2015-07-08 DOI: 10.1145/2764920

Dengpan Liu, S. Sarkar, C. Sriskandarajah

Online personalization has become quite prevalent in recent years, with firms able to derive additional profits from such services. As the adoption of such services grows, firms implementing such practices face some operational challenges. One important challenge lies in the complexity associated with the personalization process and how to deploy available resources to handle such complexity. The complexity is exacerbated when a site faces a large volume of requests in a short amount of time, as is often the case for e-commerce and content delivery sites. In such situations, it is generally not possible for a site to provide perfectly personalized service to all requests. Instead, a firm can provide differentiated service to requests based on the amount of profiling information available about the visitor. We consider a scenario where the revenue function is concave, capturing the diminishing returns from personalization effort. Using a batching approach, we determine the optimal scheduling policy (i.e., time allocation and sequence of service) for a batch that accounts for the externality cost incurred when a request is provided service before other waiting requests. The batching approach leads to sunk costs incurred when visitors wait for the next batch to begin. An optimal admission control policy is developed to prescreen new request arrivals. We show how the policy can be implemented efficiently when the revenue function is complex and there are a large number of requests that can be served in a batch. Numerical experiments show that the proposed approach leads to substantial improvements over a linear approximation of the concave revenue function. Interestingly, we find that the improvements in firm profits are not only (or primarily) due to the different service times that are obtained when using the nonlinear personalization function—there is a ripple effect on the admission control policy that incorporates these optimized service times, which contributes even more to the additional profits than the service time optimization by itself.

近年来，在线个性化已经变得相当普遍，公司可以从这种服务中获得额外的利润。随着采用此类服务的增长，实施此类实践的公司面临着一些操作上的挑战。一个重要的挑战在于与个性化过程相关的复杂性，以及如何部署可用资源来处理这种复杂性。当站点在短时间内面临大量请求时，复杂性会加剧，这通常是电子商务和内容交付站点的情况。在这种情况下，站点通常不可能为所有请求提供完美的个性化服务。相反，公司可以根据访问者可用的分析信息的数量为请求提供差异化的服务。我们考虑一个收益函数为凹的场景，捕捉个性化努力的收益递减。使用批处理方法，我们确定了批处理的最优调度策略(即，时间分配和服务顺序)，该策略考虑了当一个请求在其他等待请求之前提供服务时所产生的外部性成本。当游客等待下一批开始时，批处理方法会导致沉没成本。制定了最优准入控制策略来预先筛选新的请求到达。我们展示了当收益函数很复杂并且有大量的请求可以批处理时，如何有效地实现策略。数值实验表明，所提出的方法比凹收益函数的线性近似有很大的改进。有趣的是，我们发现公司利润的提高不仅仅(或主要)是由于使用非线性个性化函数时获得的不同服务时间——在包含这些优化服务时间的准入控制策略上存在连锁反应，这比服务时间优化本身对额外利润的贡献更大。

{"title":"Who's Next? Scheduling Personalization Services with Variable Service Times","authors":"Dengpan Liu, S. Sarkar, C. Sriskandarajah","doi":"10.1145/2764920","DOIUrl":"https://doi.org/10.1145/2764920","url":null,"abstract":"Online personalization has become quite prevalent in recent years, with firms able to derive additional profits from such services. As the adoption of such services grows, firms implementing such practices face some operational challenges. One important challenge lies in the complexity associated with the personalization process and how to deploy available resources to handle such complexity. The complexity is exacerbated when a site faces a large volume of requests in a short amount of time, as is often the case for e-commerce and content delivery sites. In such situations, it is generally not possible for a site to provide perfectly personalized service to all requests. Instead, a firm can provide differentiated service to requests based on the amount of profiling information available about the visitor. We consider a scenario where the revenue function is concave, capturing the diminishing returns from personalization effort. Using a batching approach, we determine the optimal scheduling policy (i.e., time allocation and sequence of service) for a batch that accounts for the externality cost incurred when a request is provided service before other waiting requests. The batching approach leads to sunk costs incurred when visitors wait for the next batch to begin. An optimal admission control policy is developed to prescreen new request arrivals. We show how the policy can be implemented efficiently when the revenue function is complex and there are a large number of requests that can be served in a batch. Numerical experiments show that the proposed approach leads to substantial improvements over a linear approximation of the concave revenue function. Interestingly, we find that the improvements in firm profits are not only (or primarily) due to the different service times that are obtained when using the nonlinear personalization function—there is a ripple effect on the admission control policy that incorporates these optimized service times, which contributes even more to the additional profits than the service time optimization by itself.","PeriodicalId":178565,"journal":{"name":"ACM Trans. Manag. Inf. Syst.","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115596464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Role-Based Process View Derivation and Composition 基于角色的流程视图派生和组合

ACM Trans. Manag. Inf. Syst.

Pub Date : 2015-07-08 DOI: 10.1145/2744207

Xiaohui Zhao, Chengfei Liu, Sira Yongchareon, M. Kowalkiewicz, Wasim Sadiq

The process view concept deploys a partial and temporal representation to adjust the visible view of a business process according to various perception constraints of users. Process view technology is of practical use for privacy protection and authorization control in process-oriented business management. Owing to complex organizational structure, it is challenging for large companies to accurately specify the diverse perception of different users over business processes. Aiming to tackle this issue, this article presents a role-based process view model to incorporate role dependencies into process view derivation. Compared to existing process view approaches, ours particularly supports runtime updates to the process view perceivable to a user with specific view merging operations, thereby enabling the dynamic tracing of process perception. A series of rules and theorems are established to guarantee the structural consistency and validity of process view transformation. A hypothetical case is conducted to illustrate the feasibility of our approach, and a prototype is developed for the proof-of-concept purpose.

流程视图概念部署部分和临时表示，以便根据用户的各种感知约束调整业务流程的可见视图。在面向过程的业务管理中，过程视图技术在隐私保护和授权控制方面具有实际应用价值。由于复杂的组织结构，对于大型公司来说，准确地指定不同用户对业务流程的不同感知是一项挑战。为了解决这个问题，本文提出了一个基于角色的流程视图模型，将角色依赖关系合并到流程视图派生中。与现有的过程视图方法相比，我们的方法特别支持运行时对具有特定视图合并操作的用户可感知的过程视图的更新，从而支持对过程感知的动态跟踪。为了保证过程视图转换的结构一致性和有效性，建立了一系列规则和定理。通过一个假设的案例来说明我们的方法的可行性，并开发了一个原型用于概念验证。

引用次数: 6

Investigating Task Coordination in Globally Dispersed Teams: A Structural Contingency Perspective 调查任务协调在全球分散的团队:一个结构偶然性的观点

ACM Trans. Manag. Inf. Syst.

Pub Date : 2015-07-08 DOI: 10.1145/2688489

J. Sutanto, A. Kankanhalli, B. Tan

Task coordination poses significant challenges for globally dispersed teams (GDTs). Although various task coordination mechanisms have been proposed for such teams, there is a lack of systematic examination of the appropriate coordination mechanisms for different teams based on the nature of their task and the context under which they operate. Prior studies on collocated teams suggest matching their levels of task dependence to specific task coordination mechanisms for effective coordination. This research goes beyond the earlier work by also considering additional contextual factors of GDT (i.e., temporal dispersion and time constraints) in deriving their optimal IT-mediated task coordination mechanisms. Adopting the structural contingency theory, we propose optimal IT-mediated task coordination portfolios to fit the different levels of task dependence, temporal dispersion, and perceived time constraint of GDTs. The proposed fit is tested through a survey and profile analysis of 95 globally dispersed software development teams in a large financial organization. We find, as hypothesized, that the extent of fit between the actual IT-mediated task coordination portfolios used by the surveyed teams and their optimal portfolios proposed here is positively related to their task coordination effectiveness that in turn impacts the team's efficiency and effectiveness. The implications for theory and practice are discussed.

任务协调对全球分散的团队(gdt)提出了重大挑战。虽然为这类小组提出了各种任务协调机制，但没有根据不同小组的任务性质及其运作的背景系统地审查适当的协调机制。先前对协同工作团队的研究表明，将他们的任务依赖程度与特定的任务协调机制相匹配，可以实现有效的协调。本研究超越了早期的工作，还考虑了GDT的其他背景因素(即时间分散和时间约束)，以得出其最佳的it介导的任务协调机制。采用结构权变理论，提出了最优的it介导的任务协调组合，以适应不同水平的任务依赖、时间分散和感知时间约束。通过对一家大型金融机构中95个全球分散的软件开发团队的调查和概要分析，对所建议的契合度进行了测试。我们发现，正如假设的那样，被调查团队使用的实际it介导的任务协调组合与本文提出的最佳组合之间的匹配程度与其任务协调有效性正相关，从而影响团队的效率和有效性。讨论了理论和实践意义。

{"title":"Investigating Task Coordination in Globally Dispersed Teams: A Structural Contingency Perspective","authors":"J. Sutanto, A. Kankanhalli, B. Tan","doi":"10.1145/2688489","DOIUrl":"https://doi.org/10.1145/2688489","url":null,"abstract":"Task coordination poses significant challenges for globally dispersed teams (GDTs). Although various task coordination mechanisms have been proposed for such teams, there is a lack of systematic examination of the appropriate coordination mechanisms for different teams based on the nature of their task and the context under which they operate. Prior studies on collocated teams suggest matching their levels of task dependence to specific task coordination mechanisms for effective coordination. This research goes beyond the earlier work by also considering additional contextual factors of GDT (i.e., temporal dispersion and time constraints) in deriving their optimal IT-mediated task coordination mechanisms. Adopting the structural contingency theory, we propose optimal IT-mediated task coordination portfolios to fit the different levels of task dependence, temporal dispersion, and perceived time constraint of GDTs. The proposed fit is tested through a survey and profile analysis of 95 globally dispersed software development teams in a large financial organization. We find, as hypothesized, that the extent of fit between the actual IT-mediated task coordination portfolios used by the surveyed teams and their optimal portfolios proposed here is positively related to their task coordination effectiveness that in turn impacts the team's efficiency and effectiveness. The implications for theory and practice are discussed.","PeriodicalId":178565,"journal":{"name":"ACM Trans. Manag. Inf. Syst.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133946695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Understanding Business Ecosystem Dynamics: A Data-Driven Approach 理解商业生态系统动态:数据驱动的方法

ACM Trans. Manag. Inf. Syst.

Pub Date : 2015-07-08 DOI: 10.1145/2724730

Rahul C. Basole, Martha G. Russell, Jukka Huhtamäki, Neil Rubens, Kaisa Still, Hyunwoo Park

Business ecosystems consist of a heterogeneous and continuously evolving set of entities that are interconnected through a complex, global network of relationships. However, there is no well-established methodology to study the dynamics of this network. Traditional approaches have primarily utilized a single source of data of relatively established firms; however, these approaches ignore the vast number of relevant activities that often occur at the individual and entrepreneurial levels. We argue that a data-driven visualization approach, using both institutionally and socially curated datasets, can provide important complementary, triangulated explanatory insights into the dynamics of interorganizational networks in general and business ecosystems in particular. We develop novel visualization layouts to help decision makers systemically identify and compare ecosystems. Using traditionally disconnected data sources on deals and alliance relationships (DARs), executive and funding relationships (EFRs), and public opinion and discourse (POD), we empirically illustrate our data-driven method of data triangulation and visualization techniques through three cases in the mobile industry Google’s acquisition of Motorola Mobility, the coopetitive relation between Apple and Samsung, and the strategic partnership between Nokia and Microsoft. The article concludes with implications and future research opportunities.

商业生态系统由一组异构且不断发展的实体组成，这些实体通过复杂的全球关系网络相互连接。然而，目前还没有完善的方法来研究这个网络的动态。传统方法主要利用相对成熟的公司的单一数据来源;但是，这些办法忽略了经常在个人和企业一级发生的大量有关活动。我们认为，数据驱动的可视化方法，使用制度和社会策划的数据集，可以为组织间网络的动态，特别是商业生态系统，提供重要的补充，三角化的解释性见解。我们开发了新的可视化布局，以帮助决策者系统地识别和比较生态系统。利用传统上不相关的交易和联盟关系(dar)、高管和融资关系(EFRs)以及公众舆论和话语权(POD)数据源，我们通过移动行业的三个案例实证地说明了数据驱动的数据三角测量和可视化技术方法:谷歌收购摩托罗拉移动、苹果和三星的合作关系、诺基亚和微软的战略合作伙伴关系。文章最后提出了启示和未来的研究机会。

{"title":"Understanding Business Ecosystem Dynamics: A Data-Driven Approach","authors":"Rahul C. Basole, Martha G. Russell, Jukka Huhtamäki, Neil Rubens, Kaisa Still, Hyunwoo Park","doi":"10.1145/2724730","DOIUrl":"https://doi.org/10.1145/2724730","url":null,"abstract":"Business ecosystems consist of a heterogeneous and continuously evolving set of entities that are interconnected through a complex, global network of relationships. However, there is no well-established methodology to study the dynamics of this network. Traditional approaches have primarily utilized a single source of data of relatively established firms; however, these approaches ignore the vast number of relevant activities that often occur at the individual and entrepreneurial levels. We argue that a data-driven visualization approach, using both institutionally and socially curated datasets, can provide important complementary, triangulated explanatory insights into the dynamics of interorganizational networks in general and business ecosystems in particular. We develop novel visualization layouts to help decision makers systemically identify and compare ecosystems. Using traditionally disconnected data sources on deals and alliance relationships (DARs), executive and funding relationships (EFRs), and public opinion and discourse (POD), we empirically illustrate our data-driven method of data triangulation and visualization techniques through three cases in the mobile industry Google’s acquisition of Motorola Mobility, the coopetitive relation between Apple and Samsung, and the strategic partnership between Nokia and Microsoft. The article concludes with implications and future research opportunities.","PeriodicalId":178565,"journal":{"name":"ACM Trans. Manag. Inf. Syst.","volume":"99 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113960641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 124

Stakeholder Analyses of Firm-Related Web Forums: Applications in Stock Return Prediction 企业相关网络论坛的利益相关者分析:在股票收益预测中的应用

ACM Trans. Manag. Inf. Syst.

Pub Date : 2015-04-03 DOI: 10.1145/2675693

David Zimbra, Hsinchun Chen, R. Lusch

In this study, we present stakeholder analyses of firm-related web forums. Prior analyses of firm-related forums have considered all participants in the aggregate, failing to recognize the potential for diversity within the populations. However, distinctive groups of forum participants may represent various interests and stakes in a firm worthy of consideration. To perform the stakeholder analyses, the Stakeholder Analyzer system for firm-related web forums is developed following the design science paradigm of information systems research. The design of the system and its approach to stakeholder analysis is guided by two kernel theories, the stakeholder theory of the firm and the systemic functional linguistic theory. A stakeholder analysis identifies distinctive groups of forum participants with shared characteristics expressed in discussion and evaluates their specific opinions and interests in the firm. Stakeholder analyses are performed in six major firm-related forums hosted on Yahoo Finance over a 3-month period. The relationships between measures extracted from the forums and subsequent daily firm stock returns are examined using multiple linear regression models, revealing statistically significant indicators of firm stock returns in the discussions of the stakeholder groups of each firm with stakeholder-model-adjusted R2 values reaching 0.83. Daily stock return prediction is also performed for 31 trading days, and stakeholder models correctly predicted the direction of return on 67% of trading days and generated an impressive 17% return in simulated trading of the six firm stocks. These evaluations demonstrate that the stakeholder analyses provided more refined assessments of the firm-related forums, yielding measures at the stakeholder group level that better explain and predict daily firm stock returns than aggregate forum-level information.

在这项研究中，我们提出了与公司相关的网络论坛的利益相关者分析。先前对公司相关论坛的分析考虑了所有参与者的总体情况，未能认识到人口内部的多样性潜力。然而，论坛参与者的不同群体可能代表着值得考虑的公司的各种利益和利害关系。为了进行利益相关者分析，本文遵循信息系统研究的设计科学范式，开发了企业相关网络论坛的利益相关者分析系统。该系统的设计及其利益相关者分析方法以企业利益相关者理论和系统功能语言学理论两个核心理论为指导。利益相关者分析确定论坛参与者的不同群体，他们在讨论中表达了共同的特征，并评估他们在公司中的具体意见和利益。在三个月的时间里，在雅虎财经主办的六个主要公司相关论坛上进行了利益相关者分析。使用多元线性回归模型检验了从论坛中提取的度量与随后的公司股票日收益之间的关系，揭示了各公司利益相关者群体讨论中公司股票收益的统计显著指标，利益相关者模型调整后的R2值达到0.83。每日股票收益预测也进行了31个交易日，利益相关者模型正确预测了67%的交易日的回报方向，并在模拟交易中产生了令人印象深刻的17%的回报。这些评估表明，利益相关者分析提供了对公司相关论坛的更精细的评估，产生了利益相关者群体层面的措施，比汇总论坛层面的信息更好地解释和预测公司股票的日常回报。

{"title":"Stakeholder Analyses of Firm-Related Web Forums: Applications in Stock Return Prediction","authors":"David Zimbra, Hsinchun Chen, R. Lusch","doi":"10.1145/2675693","DOIUrl":"https://doi.org/10.1145/2675693","url":null,"abstract":"In this study, we present stakeholder analyses of firm-related web forums. Prior analyses of firm-related forums have considered all participants in the aggregate, failing to recognize the potential for diversity within the populations. However, distinctive groups of forum participants may represent various interests and stakes in a firm worthy of consideration. To perform the stakeholder analyses, the Stakeholder Analyzer system for firm-related web forums is developed following the design science paradigm of information systems research. The design of the system and its approach to stakeholder analysis is guided by two kernel theories, the stakeholder theory of the firm and the systemic functional linguistic theory. A stakeholder analysis identifies distinctive groups of forum participants with shared characteristics expressed in discussion and evaluates their specific opinions and interests in the firm. Stakeholder analyses are performed in six major firm-related forums hosted on Yahoo Finance over a 3-month period. The relationships between measures extracted from the forums and subsequent daily firm stock returns are examined using multiple linear regression models, revealing statistically significant indicators of firm stock returns in the discussions of the stakeholder groups of each firm with stakeholder-model-adjusted R2 values reaching 0.83. Daily stock return prediction is also performed for 31 trading days, and stakeholder models correctly predicted the direction of return on 67% of trading days and generated an impressive 17% return in simulated trading of the six firm stocks. These evaluations demonstrate that the stakeholder analyses provided more refined assessments of the firm-related forums, yielding measures at the stakeholder group level that better explain and predict daily firm stock returns than aggregate forum-level information.","PeriodicalId":178565,"journal":{"name":"ACM Trans. Manag. Inf. Syst.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130847753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Ontology-Based Mapping for Automated Document Management: A Concept-Based Technique for Word Mismatch and Ambiguity Problems in Document Clustering 基于本体的自动文档管理映射:一种基于概念的文档聚类中词错配和歧义问题的解决方法

ACM Trans. Manag. Inf. Syst.

Pub Date : 2015-04-03 DOI: 10.1145/2688488

Yen-Hsien Lee, P. H. Hu, Ching-Yi Tu

Document clustering is crucial to automated document management, especially for the fast-growing volume of textual documents available digitally. Traditional lexicon-based approaches depend on document content analysis and measure overlap of the feature vectors representing different documents, which cannot effectively address word mismatch or ambiguity problems. Alternative query expansion and local context discovery approaches are developed but suffer from limited efficiency and effectiveness, because the large number of expanded terms create noise and increase the dimensionality and complexity of the overall feature space. Several techniques extend lexicon-based analysis by incorporating latent semantic indexing but produce less comprehensible clustering results and questionable performance. We instead propose a concept-based document representation and clustering (CDRC) technique and empirically examine its effectiveness using 433 articles concerning information systems and technology, randomly selected from a popular digital library. Our evaluation includes two widely used benchmark techniques and shows that CDRC outperforms them. Overall, our results reveal that clustering documents at an ontology-based, concept-based level is more effective than techniques using lexicon-based document features and can generate more comprehensible clustering results.

文档聚类对于自动化文档管理至关重要，特别是对于快速增长的数字文本文档。传统的基于词典的方法依赖于文档内容分析和度量代表不同文档的特征向量的重叠，不能有效地解决单词不匹配或歧义问题。人们开发了其他查询扩展和局部上下文发现方法，但效率和有效性有限，因为大量扩展的术语会产生噪声，并增加整个特征空间的维数和复杂性。有几种技术通过合并潜在语义索引扩展了基于词典的分析，但会产生难以理解的聚类结果，性能也有问题。我们提出了一种基于概念的文档表示和聚类(CDRC)技术，并从一个流行的数字图书馆中随机选择了433篇有关信息系统和技术的文章，对其有效性进行了实证检验。我们的评估包括两种广泛使用的基准技术，并表明CDRC优于它们。总的来说，我们的结果表明，在基于本体、基于概念的级别上聚类文档比使用基于词典的文档特征的技术更有效，并且可以生成更易于理解的聚类结果。

{"title":"Ontology-Based Mapping for Automated Document Management: A Concept-Based Technique for Word Mismatch and Ambiguity Problems in Document Clustering","authors":"Yen-Hsien Lee, P. H. Hu, Ching-Yi Tu","doi":"10.1145/2688488","DOIUrl":"https://doi.org/10.1145/2688488","url":null,"abstract":"Document clustering is crucial to automated document management, especially for the fast-growing volume of textual documents available digitally. Traditional lexicon-based approaches depend on document content analysis and measure overlap of the feature vectors representing different documents, which cannot effectively address word mismatch or ambiguity problems. Alternative query expansion and local context discovery approaches are developed but suffer from limited efficiency and effectiveness, because the large number of expanded terms create noise and increase the dimensionality and complexity of the overall feature space. Several techniques extend lexicon-based analysis by incorporating latent semantic indexing but produce less comprehensible clustering results and questionable performance. We instead propose a concept-based document representation and clustering (CDRC) technique and empirically examine its effectiveness using 433 articles concerning information systems and technology, randomly selected from a popular digital library. Our evaluation includes two widely used benchmark techniques and shows that CDRC outperforms them. Overall, our results reveal that clustering documents at an ontology-based, concept-based level is more effective than techniques using lexicon-based document features and can generate more comprehensible clustering results.","PeriodicalId":178565,"journal":{"name":"ACM Trans. Manag. Inf. Syst.","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130083816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A Network Behavior-Based Botnet Detection Mechanism Using PSO and K-means 基于PSO和k均值的网络行为僵尸网络检测机制

ACM Trans. Manag. Inf. Syst.

Pub Date : 2015-04-03 DOI: 10.1145/2676869

Shing-Han Li, Yucheng Kao, Zongshen Zhang, Ying-Ping Chuang, D. Yen

In today's world, Botnet has become one of the greatest threats to network security. Network attackers, or Botmasters, use Botnet to launch the Distributed Denial of Service (DDoS) to paralyze large-scale websites or steal confidential data from infected computers. They also employ “phishing” attacks to steal sensitive information (such as users’ accounts and passwords), send bulk email advertising, and/or conduct click fraud. Even though detection technology has been much improved and some solutions to Internet security have been proposed and improved, the threat of Botnet still exists. Most of the past studies dealing with this issue used either packet contents or traffic flow characteristics to identify the invasion of Botnet. However, there still exist many problems in the areas of packet encryption and data privacy, simply because Botnet can easily change the packet contents and flow characteristics to circumvent the Intrusion Detection System (IDS). This study combines Particle Swarm Optimization (PSO) and K-means algorithms to provide a solution to remedy those problems and develop, step by step, a mechanism for Botnet detection. First, three important network behaviors are identified: long active communication behavior (ActBehavior), connection failure behavior (FailBehavior), and network scanning behavior (ScanBehavior). These behaviors are defined according to the relevant prior studies and used to analyze the communication activities among the infected computers. Second, the features of network behaviors are extracted from the flow traces in the network layer and transport layer of the network equipment. Third, PSO and K-means techniques are used to uncover the host members of Botnet in the organizational network. This study mainly utilizes the flow traces of a campus network as an experiment. The experimental findings show that this proposed approach can be employed to detect the suspicious Botnet members earlier than the detection application systems. In addition, this proposed approach is easy to implement and can be further used and extended in the campus dormitory network, home networks, and the mobile 3G network.

在当今世界，僵尸网络已经成为网络安全的最大威胁之一。网络攻击者(botmaster)利用僵尸网络(Botnet)发动分布式拒绝服务(DDoS)攻击，使大型网站瘫痪或从受感染的计算机上窃取机密数据。他们还使用“网络钓鱼”攻击来窃取敏感信息(如用户的帐户和密码)，发送大量电子邮件广告，和/或进行点击欺诈。尽管检测技术已经有了很大的进步，并且已经提出和改进了一些网络安全解决方案，但是僵尸网络的威胁仍然存在。过去的研究大多采用数据包内容或流量特征来识别僵尸网络的入侵。然而，在数据包加密和数据隐私方面仍然存在许多问题，因为僵尸网络可以很容易地改变数据包内容和流量特征来绕过入侵检测系统(IDS)。本研究结合粒子群优化(PSO)和K-means算法提供解决方案，以补救这些问题，并逐步开发僵尸网络检测机制。首先，确定了三种重要的网络行为:长时间主动通信行为(ActBehavior)、连接失败行为(FailBehavior)和网络扫描行为(ScanBehavior)。这些行为是根据先前的相关研究定义的，并用于分析受感染计算机之间的通信活动。其次，从网络设备的网络层和传输层的流迹中提取网络行为特征;第三，利用PSO和K-means技术揭示组织网络中僵尸网络的主机成员。本研究主要利用校园网的流量轨迹作为实验。实验结果表明，该方法可以比检测应用系统更早地检测出可疑的僵尸网络成员。此外，该方法易于实现，可以在校园宿舍网、家庭网络和移动3G网络中进一步使用和扩展。

{"title":"A Network Behavior-Based Botnet Detection Mechanism Using PSO and K-means","authors":"Shing-Han Li, Yucheng Kao, Zongshen Zhang, Ying-Ping Chuang, D. Yen","doi":"10.1145/2676869","DOIUrl":"https://doi.org/10.1145/2676869","url":null,"abstract":"In today's world, Botnet has become one of the greatest threats to network security. Network attackers, or Botmasters, use Botnet to launch the Distributed Denial of Service (DDoS) to paralyze large-scale websites or steal confidential data from infected computers. They also employ “phishing” attacks to steal sensitive information (such as users’ accounts and passwords), send bulk email advertising, and/or conduct click fraud. Even though detection technology has been much improved and some solutions to Internet security have been proposed and improved, the threat of Botnet still exists. Most of the past studies dealing with this issue used either packet contents or traffic flow characteristics to identify the invasion of Botnet. However, there still exist many problems in the areas of packet encryption and data privacy, simply because Botnet can easily change the packet contents and flow characteristics to circumvent the Intrusion Detection System (IDS). This study combines Particle Swarm Optimization (PSO) and K-means algorithms to provide a solution to remedy those problems and develop, step by step, a mechanism for Botnet detection. First, three important network behaviors are identified: long active communication behavior (ActBehavior), connection failure behavior (FailBehavior), and network scanning behavior (ScanBehavior). These behaviors are defined according to the relevant prior studies and used to analyze the communication activities among the infected computers. Second, the features of network behaviors are extracted from the flow traces in the network layer and transport layer of the network equipment. Third, PSO and K-means techniques are used to uncover the host members of Botnet in the organizational network. This study mainly utilizes the flow traces of a campus network as an experiment. The experimental findings show that this proposed approach can be employed to detect the suspicious Botnet members earlier than the detection application systems. In addition, this proposed approach is easy to implement and can be further used and extended in the campus dormitory network, home networks, and the mobile 3G network.","PeriodicalId":178565,"journal":{"name":"ACM Trans. Manag. Inf. Syst.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131304742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 42

A Case Study of Data Quality in Text Mining Clinical Progress Notes 临床进展记录文本挖掘中数据质量的案例研究

ACM Trans. Manag. Inf. Syst.

Pub Date : 2015-04-03 DOI: 10.1145/2669368

D. Berndt, J. McCart, Dezon K. Finch, S. Luther

Text analytic methods are often aimed at extracting useful information from the vast array of unstructured, free format text documents that are created by almost all organizational processes. The success of any text mining application rests on the quality of the underlying data being analyzed, including both predictive features and outcome labels. In this case study, some focused experiments regarding data quality are used to assess the robustness of Statistical Text Mining (STM) algorithms when applied to clinical progress notes. In particular, the experiments consider the impacts of task complexity (by removing signals), training set size, and target outcome quality. While this research is conducted using a dataset drawn from the medical domain, the data quality issues explored are of more general interest.

文本分析方法通常旨在从几乎所有组织过程创建的大量非结构化、自由格式的文本文档中提取有用的信息。任何文本挖掘应用程序的成功都取决于所分析的底层数据的质量，包括预测特征和结果标签。在本案例研究中，一些关于数据质量的重点实验被用来评估统计文本挖掘(STM)算法在应用于临床进展记录时的鲁棒性。特别是，实验考虑了任务复杂性(通过去除信号)、训练集大小和目标结果质量的影响。虽然本研究是使用来自医学领域的数据集进行的，但所探索的数据质量问题具有更普遍的意义。

引用次数: 20

Predicting Stability of Open-Source Software Systems Using Combination of Bayesian Classifiers 基于贝叶斯分类器的开源软件系统稳定性预测

ACM Trans. Manag. Inf. Syst.

Pub Date : 2014-04-01 DOI: 10.1145/2555596

S. Bouktif, H. Sahraoui, F. Ahmed

The use of free and Open-Source Software (OSS) systems is gaining momentum. Organizations are also now adopting OSS, despite some reservations, particularly about the quality issues. Stability of software is one of the main features in software quality management that needs to be understood and accurately predicted. It deals with the impact resulting from software changes and argues that stable components lead to a cost-effective software evolution. Changes are most common phenomena present in OSS in comparison to proprietary software. This makes OSS system evolution a rich context to study and predict stability. Our objective in this work is to build stability prediction models that are not only accurate but also interpretable, that is, able to explain the link between the architectural aspects of a software component and its stability behavior in the context of OSS. Therefore, we propose a new approach based on classifiers combination capable of preserving prediction interpretability. Our approach is classifier-structure dependent. Therefore, we propose a particular solution for combining Bayesian classifiers in order to derive a more accurate composite classifier that preserves interpretability. This solution is implemented using a genetic algorithm and applied in the context of an OSS large-scale system, namely the standard Java API. The empirical results show that our approach outperforms state-of-the-art approaches from both machine learning and software engineering.

使用免费和开源软件(OSS)系统的势头正在增强。组织现在也在采用OSS，尽管有一些保留意见，特别是关于质量问题。软件的稳定性是软件质量管理的主要特征之一，需要理解和准确预测。它讨论了软件变更所带来的影响，并认为稳定的组件会导致成本效益高的软件演进。与专有软件相比，变化是OSS中最常见的现象。这使得OSS系统演化成为研究和预测稳定性的丰富背景。我们在这项工作中的目标是建立不仅准确而且可解释的稳定性预测模型，也就是说，能够解释软件组件的体系结构方面与其在OSS环境中的稳定性行为之间的联系。因此，我们提出了一种基于分类器组合的新方法，能够保持预测的可解释性。我们的方法依赖于分类器结构。因此，我们提出了一个结合贝叶斯分类器的特殊解决方案，以获得一个更准确的复合分类器，同时保持可解释性。该解决方案使用遗传算法实现，并应用于OSS大规模系统的上下文中，即标准Java API。实证结果表明，我们的方法优于机器学习和软件工程的最先进方法。

{"title":"Predicting Stability of Open-Source Software Systems Using Combination of Bayesian Classifiers","authors":"S. Bouktif, H. Sahraoui, F. Ahmed","doi":"10.1145/2555596","DOIUrl":"https://doi.org/10.1145/2555596","url":null,"abstract":"The use of free and Open-Source Software (OSS) systems is gaining momentum. Organizations are also now adopting OSS, despite some reservations, particularly about the quality issues. Stability of software is one of the main features in software quality management that needs to be understood and accurately predicted. It deals with the impact resulting from software changes and argues that stable components lead to a cost-effective software evolution. Changes are most common phenomena present in OSS in comparison to proprietary software. This makes OSS system evolution a rich context to study and predict stability. Our objective in this work is to build stability prediction models that are not only accurate but also interpretable, that is, able to explain the link between the architectural aspects of a software component and its stability behavior in the context of OSS. Therefore, we propose a new approach based on classifiers combination capable of preserving prediction interpretability. Our approach is classifier-structure dependent. Therefore, we propose a particular solution for combining Bayesian classifiers in order to derive a more accurate composite classifier that preserves interpretability. This solution is implemented using a genetic algorithm and applied in the context of an OSS large-scale system, namely the standard Java API. The empirical results show that our approach outperforms state-of-the-art approaches from both machine learning and software engineering.","PeriodicalId":178565,"journal":{"name":"ACM Trans. Manag. Inf. Syst.","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125146409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8