Proceedings of the ... International World-Wide Web Conference. International WWW Conference最新文献_第2页

Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs. Any-k:标记图中的Anytime Top-k树模式检索。

Proceedings of the ... International World-Wide Web Conference. International WWW Conference

Pub Date : 2018-04-01 DOI: 10.1145/3178876.3186115

Xiaofeng Yang, Patrick K Nicholson, Deepak Ajwani, Mirek Riedewald, Wolfgang Gatterbauer, Alessandra Sala

Many problems in areas as diverse as recommendation systems, social network analysis, semantic search, and distributed root cause analysis can be modeled as pattern search on labeled graphs (also called "heterogeneous information networks" or HINs). Given a large graph and a query pattern with node and edge label constraints, a fundamental challenge is to find the top-k matches according to a ranking function over edge and node weights. For users, it is difficult to select value k. We therefore propose the novel notion of an any-k ranking algorithm: for a given time budget, return as many of the top-ranked results as possible. Then, given additional time, produce the next lower-ranked results quickly as well. It can be stopped anytime, but may have to continue until all results are returned. This paper focuses on acyclic patterns over arbitrary labeled graphs. We are interested in practical algorithms that effectively exploit (1) properties of heterogeneous networks, in particular selective constraints on labels, and (2) that the users often explore only a fraction of the top-ranked results. Our solution, KARPET, carefully integrates aggressive pruning that leverages the acyclic nature of the query, and incremental guided search. It enables us to prove strong non-trivial time and space guarantees, which is generally considered very hard for this type of graph search problem. Through experimental studies we show that KARPET achieves running times in the order of milliseconds for tree patterns on large networks with millions of nodes and edges.

推荐系统、社交网络分析、语义搜索和分布式根本原因分析等领域的许多问题都可以建模为标记图上的模式搜索（也称为“异构信息网络”或HIN）。给定一个大图和一个具有节点和边标签约束的查询模式，一个基本的挑战是根据边和节点权重的排序函数来找到前k个匹配。对于用户来说，很难选择值k。因此，我们提出了any-k排名算法的新概念：在给定的时间预算下，返回尽可能多的排名靠前的结果。然后，如果有额外的时间，也可以快速生成排名较低的下一个结果。它可以随时停止，但可能必须继续，直到返回所有结果。本文主要研究任意标记图上的非循环模式。我们感兴趣的是能够有效利用（1）异构网络的特性，特别是标签上的选择性约束，以及（2）用户通常只探索排名靠前的结果的一小部分的实用算法。我们的解决方案KARPET小心地集成了利用查询的非循环性质的主动修剪和增量引导搜索。它使我们能够证明强大的非平凡的时间和空间保证，这通常被认为对于这类图搜索问题非常困难。通过实验研究，我们发现KARPET在具有数百万节点和边缘的大型网络上实现了树模式的毫秒级运行时间。

{"title":"Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs.","authors":"Xiaofeng Yang, Patrick K Nicholson, Deepak Ajwani, Mirek Riedewald, Wolfgang Gatterbauer, Alessandra Sala","doi":"10.1145/3178876.3186115","DOIUrl":"10.1145/3178876.3186115","url":null,"abstract":"Many problems in areas as diverse as recommendation systems, social network analysis, semantic search, and distributed root cause analysis can be modeled as pattern search on labeled graphs (also called \"heterogeneous information networks\" or HINs). Given a large graph and a query pattern with node and edge label constraints, a fundamental challenge is to find the top-k matches according to a ranking function over edge and node weights. For users, it is difficult to select value k. We therefore propose the novel notion of an any-k ranking algorithm: for a given time budget, return as many of the top-ranked results as possible. Then, given additional time, produce the next lower-ranked results quickly as well. It can be stopped anytime, but may have to continue until all results are returned. This paper focuses on acyclic patterns over arbitrary labeled graphs. We are interested in practical algorithms that effectively exploit (1) properties of heterogeneous networks, in particular selective constraints on labels, and (2) that the users often explore only a fraction of the top-ranked results. Our solution, KARPET, carefully integrates aggressive pruning that leverages the acyclic nature of the query, and incremental guided search. It enables us to prove strong non-trivial time and space guarantees, which is generally considered very hard for this type of graph search problem. Through experimental studies we show that KARPET achieves running times in the order of milliseconds for tree patterns on large networks with millions of nodes and edges.","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2018 ","pages":"489-498"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3178876.3186115","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36308762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

I'll Be Back: On the Multiple Lives of Users of a Mobile Activity Tracking Application. 我会回来的:关于移动活动跟踪应用程序用户的多重生活。

Proceedings of the ... International World-Wide Web Conference. International WWW Conference

Pub Date : 2018-04-01 DOI: 10.1145/3178876.3186062

Zhiyuan Lin, Tim Althoff, Jure Leskovec

Mobile health applications that track activities, such as exercise, sleep, and diet, are becoming widely used. While these activity tracking applications have the potential to improve our health, user engagement and retention are critical factors for their success. However, long-term user engagement patterns in real-world activity tracking applications are not yet well understood. Here we study user engagement patterns within a mobile physical activity tracking application consisting of 115 million logged activities taken by over a million users over 31 months. Specifically, we show that over 75% of users return and re-engage with the application after prolonged periods of inactivity, no matter the duration of the inactivity. We find a surprising result that the re-engagement usage patterns resemble those of the start of the initial engagement period, rather than being a simple continuation of the end of the initial engagement period. This evidence points to a conceptual model of multiple lives of user engagement, extending the prevalent single life view of user activity. We demonstrate that these multiple lives occur because the users have a variety of different primary intents or goals for using the app. These primary intents are associated with how long each life lasts and how likely the user is to re-engage for a new life. We find evidence for users being more likely to stop using the app once they achieved their primary intent or goal (e.g., weight loss). However, these users might return once their original intent resurfaces (e.g., wanting to lose newly gained weight). We discuss implications of the multiple life paradigm and propose a novel prediction task of predicting the number of lives of a user. Based on insights developed in this work, including a marker of improved primary intent performance, our prediction models achieve 71% ROC AUC. Overall, our research has implications for modeling user re-engagement in health activity tracking applications and has consequences for how notifications, recommendations as well as gamification can be used to increase engagement.

跟踪运动、睡眠和饮食等活动的移动健康应用程序正得到广泛使用。虽然这些活动跟踪应用程序有可能改善我们的健康状况，但用户参与度和留存率是它们成功的关键因素。然而，在现实世界的活动跟踪应用中，长期用户粘性模式还没有得到很好的理解。在这里，我们研究了一个移动体育活动跟踪应用程序中的用户参与模式，该应用程序由超过100万用户在31个月内进行的1.15亿次记录活动组成。具体来说，我们发现超过75%的用户在长时间不活跃后返回并重新参与应用程序，无论不活跃的时间有多长。我们发现了一个令人惊讶的结果，即重新粘性的使用模式类似于初始粘性阶段的开始，而不是初始粘性阶段结束的简单延续。这一证据指向了用户参与的多重生命的概念模型，扩展了流行的用户活动的单一生命观。我们证明，这些多重生命的发生是因为用户使用应用程序有各种不同的主要意图或目标。这些主要意图与每次生命持续的时间长短以及用户重新投入新生活的可能性有关。我们发现有证据表明，一旦用户实现了他们的主要意图或目标(例如，减肥)，他们就更有可能停止使用该应用。然而，这些用户可能会返回一旦他们最初的意图重新出现(例如，想要减掉新增加的体重)。我们讨论了多重生命范式的含义，并提出了一种预测用户生命数的新预测任务。基于这项工作中开发的见解，包括改进的主要意图表现的标记，我们的预测模型达到了71%的ROC AUC。总的来说，我们的研究对健康活动跟踪应用程序的用户再参与建模有影响，并对如何使用通知、推荐和游戏化来提高用户参与度产生影响。

{"title":"I'll Be Back: On the Multiple Lives of Users of a Mobile Activity Tracking Application.","authors":"Zhiyuan Lin, Tim Althoff, Jure Leskovec","doi":"10.1145/3178876.3186062","DOIUrl":"https://doi.org/10.1145/3178876.3186062","url":null,"abstract":"Mobile health applications that track activities, such as exercise, sleep, and diet, are becoming widely used. While these activity tracking applications have the potential to improve our health, user engagement and retention are critical factors for their success. However, long-term user engagement patterns in real-world activity tracking applications are not yet well understood. Here we study user engagement patterns within a mobile physical activity tracking application consisting of 115 million logged activities taken by over a million users over 31 months. Specifically, we show that over 75% of users return and re-engage with the application after prolonged periods of inactivity, no matter the duration of the inactivity. We find a surprising result that the re-engagement usage patterns resemble those of the start of the initial engagement period, rather than being a simple continuation of the end of the initial engagement period. This evidence points to a conceptual model of multiple lives of user engagement, extending the prevalent single life view of user activity. We demonstrate that these multiple lives occur because the users have a variety of different primary intents or goals for using the app. These primary intents are associated with how long each life lasts and how likely the user is to re-engage for a new life. We find evidence for users being more likely to stop using the app once they achieved their primary intent or goal (e.g., weight loss). However, these users might return once their original intent resurfaces (e.g., wanting to lose newly gained weight). We discuss implications of the multiple life paradigm and propose a novel prediction task of predicting the number of lives of a user. Based on insights developed in this work, including a marker of improved primary intent performance, our prediction models achieve 71% ROC AUC. Overall, our research has implications for modeling user re-engagement in health activity tracking applications and has consequences for how notifications, recommendations as well as gamification can be used to increase engagement.","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2018 ","pages":"1501-1511"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3178876.3186062","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36115486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Modeling Individual Cyclic Variation in Human Behavior. 模拟人类行为的个体循环变化。

Proceedings of the ... International World-Wide Web Conference. International WWW Conference

Pub Date : 2018-04-01 DOI: 10.1145/3178876.3186052

Emma Pierson, Tim Althoff, Jure Leskovec

Cycles are fundamental to human health and behavior. Examples include mood cycles, circadian rhythms, and the menstrual cycle. However, modeling cycles in time series data is challenging because in most cases the cycles are not labeled or directly observed and need to be inferred from multidimensional measurements taken over time. Here, we present Cyclic Hidden Markov Models (CyH-MMs) for detecting and modeling cycles in a collection of multidimensional heterogeneous time series data. In contrast to previous cycle modeling methods, CyHMMs deal with a number of challenges encountered in modeling real-world cycles: they can model multivariate data with both discrete and continuous dimensions; they explicitly model and are robust to missing data; and they can share information across individuals to accommodate variation both within and between individual time series. Experiments on synthetic and real-world health-tracking data demonstrate that CyHMMs infer cycle lengths more accurately than existing methods, with 58% lower error on simulated data and 63% lower error on real-world data compared to the best-performing baseline. CyHMMs can also perform functions which baselines cannot: they can model the progression of individual features/symptoms over the course of the cycle, identify the most variable features, and cluster individual time series into groups with distinct characteristics. Applying CyHMMs to two real-world health-tracking datasets-of human menstrual cycle symptoms and physical activity tracking data-yields important insights including which symptoms to expect at each point during the cycle. We also find that people fall into several groups with distinct cycle patterns, and that these groups differ along dimensions not provided to the model. For example, by modeling missing data in the menstrual cycles dataset, we are able to discover a medically relevant group of birth control users even though information on birth control is not given to the model.

周期是人类健康和行为的基础。例子包括情绪周期、昼夜节律和月经周期。然而，在时间序列数据中建模周期是具有挑战性的，因为在大多数情况下，周期没有被标记或直接观察到，需要从随时间推移的多维测量中推断出来。在这里，我们提出了循环隐马尔可夫模型(cyh - mm)，用于检测和建模多维异构时间序列数据集合中的周期。与以前的周期建模方法相比，cyhmm处理了建模真实世界周期时遇到的许多挑战:它们可以对离散和连续维度的多变量数据进行建模;它们显式建模并且对缺失数据具有鲁棒性;它们可以在个体之间共享信息，以适应个体时间序列内部和个体时间序列之间的变化。在合成和真实的健康跟踪数据上进行的实验表明，cyhmm比现有方法更准确地推断周期长度，与性能最佳的基线相比，模拟数据的误差降低了58%，真实数据的误差降低了63%。cyhmm还可以执行基线无法完成的功能:它们可以模拟单个特征/症状在周期过程中的进展，识别最易变化的特征，并将单个时间序列聚类成具有不同特征的组。将cyhmm应用于两个现实世界的健康跟踪数据集——人类月经周期症状和身体活动跟踪数据——可以产生重要的见解，包括在周期的每个点预期出现哪些症状。我们还发现，人们分为几个具有不同循环模式的群体，这些群体在模型未提供的维度上存在差异。例如，通过对月经周期数据集中缺失的数据进行建模，我们能够发现与生育控制相关的医学组用户，即使模型没有提供有关生育控制的信息。

{"title":"Modeling Individual Cyclic Variation in Human Behavior.","authors":"Emma Pierson, Tim Althoff, Jure Leskovec","doi":"10.1145/3178876.3186052","DOIUrl":"10.1145/3178876.3186052","url":null,"abstract":"Cycles are fundamental to human health and behavior. Examples include mood cycles, circadian rhythms, and the menstrual cycle. However, modeling cycles in time series data is challenging because in most cases the cycles are not labeled or directly observed and need to be inferred from multidimensional measurements taken over time. Here, we present Cyclic Hidden Markov Models (CyH-MMs) for detecting and modeling cycles in a collection of multidimensional heterogeneous time series data. In contrast to previous cycle modeling methods, CyHMMs deal with a number of challenges encountered in modeling real-world cycles: they can model multivariate data with both discrete and continuous dimensions; they explicitly model and are robust to missing data; and they can share information across individuals to accommodate variation both within and between individual time series. Experiments on synthetic and real-world health-tracking data demonstrate that CyHMMs infer cycle lengths more accurately than existing methods, with 58% lower error on simulated data and 63% lower error on real-world data compared to the best-performing baseline. CyHMMs can also perform functions which baselines cannot: they can model the progression of individual features/symptoms over the course of the cycle, identify the most variable features, and cluster individual time series into groups with distinct characteristics. Applying CyHMMs to two real-world health-tracking datasets-of human menstrual cycle symptoms and physical activity tracking data-yields important insights including which symptoms to expect at each point during the cycle. We also find that people fall into several groups with distinct cycle patterns, and that these groups differ along dimensions not provided to the model. For example, by modeling missing data in the menstrual cycles dataset, we are able to discover a medically relevant group of birth control users even though information on birth control is not given to the model.","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2018 ","pages":"107-116"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3178876.3186052","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36115484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

PhLeGrA: Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data. PhLeGrA：生命科学关联开放数据网上的药理学图谱分析。

Proceedings of the ... International World-Wide Web Conference. International WWW Conference

Pub Date : 2017-04-01 DOI: 10.1145/3038912.3052692

Maulik R Kamdar, Mark A Musen

Integrated approaches for pharmacology are required for the mechanism-based predictions of adverse drug reactions that manifest due to concomitant intake of multiple drugs. These approaches require the integration and analysis of biomedical data and knowledge from multiple, heterogeneous sources with varying schemas, entity notations, and formats. To tackle these integrative challenges, the Semantic Web community has published and linked several datasets in the Life Sciences Linked Open Data (LSLOD) cloud using established W3C standards. We present the PhLeGrA platform for Linked Graph Analytics in Pharmacology in this paper. Through query federation, we integrate four sources from the LSLOD cloud and extract a drug-reaction network, composed of distinct entities. We represent this graph as a hidden conditional random field (HCRF), a discriminative latent variable model that is used for structured output predictions. We calculate the underlying probability distributions in the drug-reaction HCRF using the datasets from the U.S. Food and Drug Administration's Adverse Event Reporting System. We predict the occurrence of 146 adverse reactions due to multiple drug intake with an AUROC statistic greater than 0.75. The PhLeGrA platform can be extended to incorporate other sources published using Semantic Web technologies, as well as to discover other types of pharmacological associations.

要对因同时服用多种药物而出现的药物不良反应进行基于机理的预测，就需要采用药理学综合方法。这些方法需要整合和分析来自多种异构来源的生物医学数据和知识，这些来源的模式、实体符号和格式各不相同。为了应对这些整合性挑战，语义网社区已在生命科学关联开放数据（LSLOD）云中发布了多个数据集，并利用已建立的万维网联盟（W3C）标准进行了链接。我们在本文中介绍了用于药理学关联图分析的 PhLeGrA 平台。通过查询联合，我们整合了来自 LSLOD 云的四个数据源，并提取了由不同实体组成的药物反应网络。我们将该图表示为隐藏条件随机场（HCRF），这是一种用于结构化输出预测的判别潜变量模型。我们利用美国食品和药物管理局不良事件报告系统的数据集计算药物反应 HCRF 的基本概率分布。我们预测了因多种药物摄入导致的 146 种不良反应的发生率，其 AUROC 统计量大于 0.75。PhLeGrA平台可以扩展到使用语义网技术发布的其他数据源，以及发现其他类型的药理关联。

{"title":"PhLeGrA: Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data.","authors":"Maulik R Kamdar, Mark A Musen","doi":"10.1145/3038912.3052692","DOIUrl":"10.1145/3038912.3052692","url":null,"abstract":"Integrated approaches for pharmacology are required for the mechanism-based predictions of adverse drug reactions that manifest due to concomitant intake of multiple drugs. These approaches require the integration and analysis of biomedical data and knowledge from multiple, heterogeneous sources with varying schemas, entity notations, and formats. To tackle these integrative challenges, the Semantic Web community has published and linked several datasets in the Life Sciences Linked Open Data (LSLOD) cloud using established W3C standards. We present the PhLeGrA platform for Linked Graph Analytics in Pharmacology in this paper. Through query federation, we integrate four sources from the LSLOD cloud and extract a drug-reaction network, composed of distinct entities. We represent this graph as a hidden conditional random field (HCRF), a discriminative latent variable model that is used for structured output predictions. We calculate the underlying probability distributions in the drug-reaction HCRF using the datasets from the U.S. Food and Drug Administration's Adverse Event Reporting System. We predict the occurrence of 146 adverse reactions due to multiple drug intake with an AUROC statistic greater than 0.75. The PhLeGrA platform can be extended to incorporate other sources published using Semantic Web technologies, as well as to discover other types of pharmacological associations.","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2017 ","pages":"321-329"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5824722/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35861874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

How Gamification Affects Physical Activity: Large-scale Analysis of Walking Challenges in a Mobile Application. 游戏化如何影响身体活动:移动应用程序中行走挑战的大规模分析。

Proceedings of the ... International World-Wide Web Conference. International WWW Conference

Pub Date : 2017-04-01 DOI: 10.1145/3041021.3054172

Ali Shameli, Tim Althoff, Amin Saberi, Jure Leskovec

Gamification represents an effective way to incentivize user behavior across a number of computing applications. However, despite the fact that physical activity is essential for a healthy lifestyle, surprisingly little is known about how gamification and in particular competitions shape human physical activity. Here we study how competitions affect physical activity. We focus on walking challenges in a mobile activity tracking application where multiple users compete over who takes the most steps over a predefined number of days. We synthesize our findings in a series of game and app design implications. In particular, we analyze nearly 2,500 physical activity competitions over a period of one year capturing more than 800,000 person days of activity tracking. We observe that during walking competitions, the average user increases physical activity by 23%. Furthermore, there are large increases in activity for both men and women across all ages, and weight status, and even for users that were previously fairly inactive. We also find that the composition of participants greatly affects the dynamics of the game. In particular, if highly unequal participants get matched to each other, then competition suffers and the overall effect on the physical activity drops significantly. Furthermore, competitions with an equal mix of both men and women are more effective in increasing the level of activities. We leverage these insights to develop a statistical model to predict whether or not a competition will be particularly engaging with significant accuracy. Our models can serve as a guideline to help design more engaging competitions that lead to most beneficial behavioral changes.

游戏化是在许多计算应用程序中激励用户行为的有效方法。然而，尽管体育活动对健康的生活方式至关重要，但令人惊讶的是，人们对游戏化，特别是比赛如何影响人类的体育活动知之甚少。在这里，我们研究竞争如何影响体育活动。我们专注于移动活动跟踪应用程序中的步行挑战，其中多个用户在预定义的天数内竞争谁走的步数最多。我们将这些发现整合到一系列游戏和应用设计中。特别是，我们分析了一年内近2500项体育活动比赛，捕获了超过80万人的活动跟踪。我们观察到，在竞走比赛中，普通用户的体力活动增加了23%。此外，在所有年龄段和体重状况下，男性和女性的活动都有大幅增加，甚至对于以前相当不活跃的用户也是如此。我们还发现，参与者的构成极大地影响了游戏的动态。特别是，如果高度不平等的参与者相互匹配，那么竞争就会受到影响，对身体活动的总体影响也会显著下降。此外，男女人数相等的比赛在提高活动水平方面更有效。我们利用这些见解来开发一个统计模型，以预测竞争是否会特别吸引人，并具有显著的准确性。我们的模型可以作为指导方针，帮助设计更具吸引力的竞赛，从而带来最有益的行为改变。

{"title":"How Gamification Affects Physical Activity: Large-scale Analysis of Walking Challenges in a Mobile Application.","authors":"Ali Shameli, Tim Althoff, Amin Saberi, Jure Leskovec","doi":"10.1145/3041021.3054172","DOIUrl":"https://doi.org/10.1145/3041021.3054172","url":null,"abstract":"Gamification represents an effective way to incentivize user behavior across a number of computing applications. However, despite the fact that physical activity is essential for a healthy lifestyle, surprisingly little is known about how gamification and in particular competitions shape human physical activity. Here we study how competitions affect physical activity. We focus on walking challenges in a mobile activity tracking application where multiple users compete over who takes the most steps over a predefined number of days. We synthesize our findings in a series of game and app design implications. In particular, we analyze nearly 2,500 physical activity competitions over a period of one year capturing more than 800,000 person days of activity tracking. We observe that during walking competitions, the average user increases physical activity by 23%. Furthermore, there are large increases in activity for both men and women across all ages, and weight status, and even for users that were previously fairly inactive. We also find that the composition of participants greatly affects the dynamics of the game. In particular, if highly unequal participants get matched to each other, then competition suffers and the overall effect on the physical activity drops significantly. Furthermore, competitions with an equal mix of both men and women are more effective in increasing the level of activities. We leverage these insights to develop a statistical model to predict whether or not a competition will be particularly engaging with significant accuracy. Our models can serve as a guideline to help design more engaging competitions that lead to most beneficial behavioral changes.","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2017 ","pages":"455-463"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3041021.3054172","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35583062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 58

Cataloguing Treatments Discussed and Used in Online Autism Communities. 自闭症在线社区中讨论和使用的治疗方法编目。

Proceedings of the ... International World-Wide Web Conference. International WWW Conference

Pub Date : 2017-04-01 DOI: 10.1145/3038912.3052661

Shaodian Zhang, Tian Kang, Lin Qiu, Weinan Zhang, Yong Yu, Noémie Elhadad

A large number of patients discuss treatments in online health communities (OHCs). One research question of interest to health researchers is whether treatments being discussed in OHCs are eventually used by community members in their real lives. In this paper, we rely on machine learning methods to automatically identify attributions of mentions of treatments from an online autism community. The context of our work is online autism communities, where parents exchange support for the care of their children with autism spectrum disorder. Our methods are able to distinguish discussions of treatments that are associated with patients, caregivers, and others, as well as identify whether a treatment is actually taken. We investigate treatments that are not just discussed but also used by patients according to two types of content analysis, cross-sectional and longitudinal. The treatments identified through our content analysis help create a catalogue of real-world treatments. This study results lay the foundation for future research to compare real-world drug usage with established clinical guidelines.

大量患者在在线健康社区（OHC）中讨论治疗方法。健康研究人员感兴趣的一个研究问题是，在线健康社区中讨论的治疗方法最终是否会被社区成员在现实生活中使用。在本文中，我们利用机器学习方法自动识别自闭症在线社区中提及治疗方法的归因。我们工作的背景是在线自闭症社区，家长们在这里交流对自闭症谱系障碍患儿的护理支持。我们的方法能够区分与患者、护理人员和其他人相关的治疗讨论，并识别治疗是否被实际采用。我们根据横向和纵向两类内容分析，调查了患者不仅讨论而且使用的治疗方法。通过内容分析确定的治疗方法有助于建立真实世界的治疗方法目录。这项研究结果为今后将真实世界的药物使用情况与既定临床指南进行比较的研究奠定了基础。

{"title":"Cataloguing Treatments Discussed and Used in Online Autism Communities.","authors":"Shaodian Zhang, Tian Kang, Lin Qiu, Weinan Zhang, Yong Yu, Noémie Elhadad","doi":"10.1145/3038912.3052661","DOIUrl":"10.1145/3038912.3052661","url":null,"abstract":"A large number of patients discuss treatments in online health communities (OHCs). One research question of interest to health researchers is whether treatments being discussed in OHCs are eventually used by community members in their real lives. In this paper, we rely on machine learning methods to automatically identify attributions of mentions of treatments from an online autism community. The context of our work is online autism communities, where parents exchange support for the care of their children with autism spectrum disorder. Our methods are able to distinguish discussions of treatments that are associated with patients, caregivers, and others, as well as identify whether a treatment is actually taken. We investigate treatments that are not just discussed but also used by patients according to two types of content analysis, cross-sectional and longitudinal. The treatments identified through our content analysis help create a catalogue of real-world treatments. This study results lay the foundation for future research to compare real-world drug usage with established clinical guidelines.","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2017 ","pages":"123-131"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5516208/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35192147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

"We make choices we think are going to save us": Debate and stance identification for online breast cancer CAM discussions. “我们做出我们认为会拯救我们的选择”:在线乳腺癌CAM讨论的辩论和立场识别。

Proceedings of the ... International World-Wide Web Conference. International WWW Conference

Pub Date : 2017-04-01 DOI: 10.1145/3041021.3055134

Shaodian Zhang, Lin Qiu, Frank Chen, Weinan Zhang, Yong Yu, Noémie Elhadad

Patients discuss complementary and alternative medicine (CAM) in online health communities. Sometimes, patients' conflicting opinions toward CAM-related issues trigger debates in the community. The objectives of this paper are to identify such debates, identify controversial CAM therapies in a popular online breast cancer community, as well as patients' stances towards them. To scale our analysis, we trained a set of classifiers. We first constructed a supervised classifier based on a long short-term memory neural network (LSTM) stacked over a convolutional neural network (CNN) to detect automatically CAM-related debates from a popular breast cancer forum. Members' stances in these debates were also identified by a CNN-based classifier. Finally, posts automatically flagged as debates by the classifier were analyzed to explore which specific CAM therapies trigger debates more often than others. Our methods are able to detect CAM debates with F score of 77%, and identify stances with F score of 70%. The debate classifier identified about 1/6 of all CAM-related posts as debate. About 60% of CAM-related debate posts represent the supportive stance toward CAM usage. Qualitative analysis shows that some specific therapies, such as Gerson therapy and usage of laetrile, trigger debates frequently among members of the breast cancer community. This study demonstrates that neural networks can effectively locate debates on usage and effectiveness of controversial CAM therapies, and can help make sense of patients' opinions on such issues under dispute. As to CAM for breast cancer, perceptions of their effectiveness vary among patients. Many of the specific therapies trigger debates frequently and are worth more exploration in future work.

患者在网上健康社区讨论补充和替代医学(CAM)。有时，患者对cam相关问题的矛盾意见会引发社区的争论。本文的目的是确定这样的争论，确定有争议的CAM疗法在一个流行的在线乳腺癌社区，以及患者对他们的立场。为了扩展我们的分析，我们训练了一组分类器。我们首先构建了一个基于长短期记忆神经网络(LSTM)叠加在卷积神经网络(CNN)上的监督分类器，以自动检测一个流行的乳腺癌论坛上与cam相关的辩论。成员在这些辩论中的立场也被基于cnn的分类器识别出来。最后，对分类器自动标记为辩论的帖子进行分析，以探索哪种特定的CAM疗法比其他疗法更容易引发辩论。我们的方法能够检测到F分为77%的CAM辩论，并识别F分为70%的立场。辩论分类器将大约1/6的cam相关帖子识别为辩论。大约60%的与CAM相关的辩论帖子代表了对CAM使用的支持立场。定性分析表明，一些特定的治疗方法，如Gerson疗法和苦杏仁素的使用，经常引发乳腺癌社区成员的争论。本研究表明，神经网络可以有效地定位有争议的CAM疗法的使用和有效性的争论，并有助于理解患者对这些争议问题的看法。至于针对乳腺癌的CAM，患者对其有效性的看法各不相同。许多特定的治疗方法经常引发争论，值得在未来的工作中进行更多的探索。

{"title":"\"We make choices we think are going to save us\": Debate and stance identification for online breast cancer CAM discussions.","authors":"Shaodian Zhang, Lin Qiu, Frank Chen, Weinan Zhang, Yong Yu, Noémie Elhadad","doi":"10.1145/3041021.3055134","DOIUrl":"https://doi.org/10.1145/3041021.3055134","url":null,"abstract":"Patients discuss complementary and alternative medicine (CAM) in online health communities. Sometimes, patients' conflicting opinions toward CAM-related issues trigger debates in the community. The objectives of this paper are to identify such debates, identify controversial CAM therapies in a popular online breast cancer community, as well as patients' stances towards them. To scale our analysis, we trained a set of classifiers. We first constructed a supervised classifier based on a long short-term memory neural network (LSTM) stacked over a convolutional neural network (CNN) to detect automatically CAM-related debates from a popular breast cancer forum. Members' stances in these debates were also identified by a CNN-based classifier. Finally, posts automatically flagged as debates by the classifier were analyzed to explore which specific CAM therapies trigger debates more often than others. Our methods are able to detect CAM debates with F score of 77%, and identify stances with F score of 70%. The debate classifier identified about 1/6 of all CAM-related posts as debate. About 60% of CAM-related debate posts represent the supportive stance toward CAM usage. Qualitative analysis shows that some specific therapies, such as Gerson therapy and usage of laetrile, trigger debates frequently among members of the breast cancer community. This study demonstrates that neural networks can effectively locate debates on usage and effectiveness of controversial CAM therapies, and can help make sense of patients' opinions on such issues under dispute. As to CAM for breast cancer, perceptions of their effectiveness vary among patients. Many of the specific therapies trigger debates frequently and are worth more exploration in future work.","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2017 ","pages":"1073-1081"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3041021.3055134","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35459049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Representing Documents via Latent Keyphrase Inference. 通过潜在关键词推理表示文档。

Proceedings of the ... International World-Wide Web Conference. International WWW Conference

Pub Date : 2016-04-01 DOI: 10.1145/2872427.2883088

Jialu Liu, Xiang Ren, Jingbo Shang, Taylor Cassidy, Clare R Voss, Jiawei Han

Many text mining approaches adopt bag-of-words or n-grams models to represent documents. Looking beyond just the words, i.e., the explicit surface forms, in a document can improve a computer's understanding of text. Being aware of this, researchers have proposed concept-based models that rely on a human-curated knowledge base to incorporate other related concepts in the document representation. But these methods are not desirable when applied to vertical domains (e.g., literature, enterprise, etc.) due to low coverage of in-domain concepts in the general knowledge base and interference from out-of-domain concepts. In this paper, we propose a data-driven model named Latent Keyphrase Inference (LAKI) that represents documents with a vector of closely related domain keyphrases instead of single words or existing concepts in the knowledge base. We show that given a corpus of in-domain documents, topical content units can be learned for each domain keyphrase, which enables a computer to do smart inference to discover latent document keyphrases, going beyond just explicit mentions. Compared with the state-of-art document representation approaches, LAKI fills the gap between bag-of-words and concept-based models by using domain keyphrases as the basic representation unit. It removes dependency on a knowledge base while providing, with keyphrases, readily interpretable representations. When evaluated against 8 other methods on two text mining tasks over two corpora, LAKI outperformed all.

许多文本挖掘方法采用词袋模型或n-grams模型来表示文档。在文档中，超越单词，即明确的表面形式，可以提高计算机对文本的理解。意识到这一点，研究人员提出了基于概念的模型，该模型依赖于人类策划的知识库，将其他相关概念纳入文档表示中。但是这些方法在应用于垂直领域(如文学、企业等)时并不理想，因为一般知识库中域内概念的覆盖率很低，并且受到域外概念的干扰。在本文中，我们提出了一种数据驱动模型，称为潜在关键短语推理(LAKI)，它用密切相关的领域关键短语向量来表示文档，而不是知识库中的单个单词或现有概念。我们表明，给定一个领域内文档的语料库，可以学习每个领域关键字的主题内容单元，这使计算机能够进行智能推理以发现潜在的文档关键字，而不仅仅是明确提及。与现有的文档表示方法相比，LAKI以领域关键短语为基本表示单元，填补了词袋模型与概念模型之间的空白。它消除了对知识库的依赖，同时通过关键字提供易于解释的表示。当在两个语料库上对两个文本挖掘任务与其他8种方法进行评估时，LAKI优于所有方法。

{"title":"Representing Documents via Latent Keyphrase Inference.","authors":"Jialu Liu, Xiang Ren, Jingbo Shang, Taylor Cassidy, Clare R Voss, Jiawei Han","doi":"10.1145/2872427.2883088","DOIUrl":"https://doi.org/10.1145/2872427.2883088","url":null,"abstract":"Many text mining approaches adopt bag-of-words or n-grams models to represent documents. Looking beyond just the words, i.e., the explicit surface forms, in a document can improve a computer's understanding of text. Being aware of this, researchers have proposed concept-based models that rely on a human-curated knowledge base to incorporate other related concepts in the document representation. But these methods are not desirable when applied to vertical domains (e.g., literature, enterprise, etc.) due to low coverage of in-domain concepts in the general knowledge base and interference from out-of-domain concepts. In this paper, we propose a data-driven model named Latent Keyphrase Inference (LAKI) that represents documents with a vector of closely related domain keyphrases instead of single words or existing concepts in the knowledge base. We show that given a corpus of in-domain documents, topical content units can be learned for each domain keyphrase, which enables a computer to do smart inference to discover latent document keyphrases, going beyond just explicit mentions. Compared with the state-of-art document representation approaches, LAKI fills the gap between bag-of-words and concept-based models by using domain keyphrases as the basic representation unit. It removes dependency on a knowledge base while providing, with keyphrases, readily interpretable representations. When evaluated against 8 other methods on two text mining tasks over two corpora, LAKI outperformed all.","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2016 ","pages":"1057-1067"},"PeriodicalIF":0.0,"publicationDate":"2016-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2872427.2883088","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34757346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Identity Management and Mental Health Discourse in Social Media. 社会媒体中的身份管理与心理健康话语

Proceedings of the ... International World-Wide Web Conference. International WWW Conference

Pub Date : 2015-05-01 DOI: 10.1145/2740908.2743049

Umashanthi Pavalanathan, Munmun De Choudhury

Social media is increasingly being adopted in health discourse. We examine the role played by identity in supporting discourse on socially stigmatized conditions. Specifically, we focus on mental health communities on reddit. We investigate the characteristics of mental health discourse manifested through reddit's characteristic 'throwaway' accounts, which are used as proxies of anonymity. For the purpose, we propose affective, cognitive, social, and linguistic style measures, drawing from literature in psychology. We observe that mental health discourse from throwaways is considerably disinhibiting and exhibits increased negativity, cognitive bias and self-attentional focus, and lowered self-esteem. Throwaways also seem to be six times more prevalent as an identity choice on mental health forums, compared to other reddit communities. We discuss the implications of our work in guiding mental health interventions, and in the design of online communities that can better cater to the needs of vulnerable populations. We conclude with thoughts on the role of identity manifestation on social media in behavioral therapy.

社交媒体越来越多地被用于健康话语。我们研究了身份在支持社会污名化条件的话语中所起的作用。具体来说，我们关注的是reddit上的心理健康社区。我们调查了心理健康话语的特征，通过reddit的特征“一次性”账户表现出来，这些账户被用作匿名代理。为此，我们从心理学文献中提出了情感、认知、社会和语言风格的测量方法。我们观察到，从丢弃的心理健康话语是相当解除抑制，并表现出增加的消极性，认知偏见和自我关注的焦点，并降低自尊。与其他reddit社区相比，在心理健康论坛上，“一次性垃圾”作为一种身份选择的流行程度似乎也要高出六倍。我们讨论了我们的工作在指导心理健康干预方面的意义，以及在设计能够更好地满足弱势群体需求的在线社区方面的意义。最后，我们对社交媒体上的身份表现在行为治疗中的作用进行了思考。

{"title":"Identity Management and Mental Health Discourse in Social Media.","authors":"Umashanthi Pavalanathan, Munmun De Choudhury","doi":"10.1145/2740908.2743049","DOIUrl":"https://doi.org/10.1145/2740908.2743049","url":null,"abstract":"Social media is increasingly being adopted in health discourse. We examine the role played by identity in supporting discourse on socially stigmatized conditions. Specifically, we focus on mental health communities on reddit. We investigate the characteristics of mental health discourse manifested through reddit's characteristic 'throwaway' accounts, which are used as proxies of anonymity. For the purpose, we propose affective, cognitive, social, and linguistic style measures, drawing from literature in psychology. We observe that mental health discourse from throwaways is considerably disinhibiting and exhibits increased negativity, cognitive bias and self-attentional focus, and lowered self-esteem. Throwaways also seem to be six times more prevalent as an identity choice on mental health forums, compared to other reddit communities. We discuss the implications of our work in guiding mental health interventions, and in the design of online communities that can better cater to the needs of vulnerable populations. We conclude with thoughts on the role of identity manifestation on social media in behavioral therapy.","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2015 Companion","pages":"315-321"},"PeriodicalIF":0.0,"publicationDate":"2015-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2740908.2743049","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34700417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 105

Donor Retention in Online Crowdfunding Communities: A Case Study of DonorsChoose.org. 在线众筹社区的捐赠者留存:以DonorsChoose.org为例

Proceedings of the ... International World-Wide Web Conference. International WWW Conference

Pub Date : 2015-05-01 DOI: 10.1145/2736277.2741120

Tim Althoff, Jure Leskovec

Online crowdfunding platforms like DonorsChoose.org and Kick-starter allow specific projects to get funded by targeted contributions from a large number of people. Critical for the success of crowdfunding communities is recruitment and continued engagement of donors. With donor attrition rates above 70%, a significant challenge for online crowdfunding platforms as well as traditional offline non-profit organizations is the problem of donor retention. We present a large-scale study of millions of donors and donations on DonorsChoose.org, a crowdfunding platform for education projects. Studying an online crowdfunding platform allows for an unprecedented detailed view of how people direct their donations. We explore various factors impacting donor retention which allows us to identify different groups of donors and quantify their propensity to return for subsequent donations. We find that donors are more likely to return if they had a positive interaction with the receiver of the donation. We also show that this includes appropriate and timely recognition of their support as well as detailed communication of their impact. Finally, we discuss how our findings could inform steps to improve donor retention in crowdfunding communities and non-profit organizations.

像DonorsChoose.org和kickstarter这样的在线众筹平台允许特定项目通过大量人群的定向捐款获得资金。众筹社区成功的关键是招募和持续参与捐助者。在捐赠者流失率超过70%的情况下，在线众筹平台和传统的线下非营利组织面临的一个重大挑战是捐赠者的保留问题。我们在DonorsChoose.org(一个教育项目众筹平台)上对数百万捐赠者和捐款进行了大规模研究。研究一个在线众筹平台，可以前所未有地详细了解人们如何指导他们的捐款。我们探索了影响捐赠者保留的各种因素，这使我们能够识别不同的捐赠者群体，并量化他们对后续捐赠的回报倾向。我们发现，如果捐赠者与接受者有积极的互动，他们更有可能回报。我们还表明，这包括适当和及时地承认他们的支持，以及详细地沟通他们的影响。最后，我们讨论了我们的研究结果如何为提高众筹社区和非营利组织的捐助者保留率提供信息。

{"title":"Donor Retention in Online Crowdfunding Communities: A Case Study of DonorsChoose.org.","authors":"Tim Althoff, Jure Leskovec","doi":"10.1145/2736277.2741120","DOIUrl":"https://doi.org/10.1145/2736277.2741120","url":null,"abstract":"Online crowdfunding platforms like DonorsChoose.org and Kick-starter allow specific projects to get funded by targeted contributions from a large number of people. Critical for the success of crowdfunding communities is recruitment and continued engagement of donors. With donor attrition rates above 70%, a significant challenge for online crowdfunding platforms as well as traditional offline non-profit organizations is the problem of donor retention. We present a large-scale study of millions of donors and donations on DonorsChoose.org, a crowdfunding platform for education projects. Studying an online crowdfunding platform allows for an unprecedented detailed view of how people direct their donations. We explore various factors impacting donor retention which allows us to identify different groups of donors and quantify their propensity to return for subsequent donations. We find that donors are more likely to return if they had a positive interaction with the receiver of the donation. We also show that this includes appropriate and timely recognition of their support as well as detailed communication of their impact. Finally, we discuss how our findings could inform steps to improve donor retention in crowdfunding communities and non-profit organizations.","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2015 ","pages":"34-44"},"PeriodicalIF":0.0,"publicationDate":"2015-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2736277.2741120","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34314338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 98