首页 > 最新文献

Proceedings of the ... International World-Wide Web Conference. International WWW Conference最新文献

英文 中文
Goal-setting And Achievement In Activity Tracking Apps: A Case Study Of MyFitnessPal. 运动追踪应用程序中的目标设定和成就:以MyFitnessPal为例
Mitchell L Gordon, Tim Althoff, Jure Leskovec

Activity tracking apps often make use of goals as one of their core motivational tools. There are two critical components to this tool: setting a goal, and subsequently achieving that goal. Despite its crucial role in how a number of prominent self-tracking apps function, there has been relatively little investigation of the goal-setting and achievement aspects of self-tracking apps. Here we explore this issue, investigating a particular goal setting and achievement process that is extensive, recorded, and crucial for both the app and its users' success: weight loss goals in MyFitnessPal. We present a large-scale study of 1.4 million users and weight loss goals, allowing for an unprecedented detailed view of how people set and achieve their goals. We find that, even for difficult long-term goals, behavior within the first 7 days predicts those who ultimately achieve their goals, that is, those who lose at least as much weight as they set out to, and those who do not. For instance, high amounts of early weight loss, which some researchers have classified as unsustainable, leads to higher goal achievement rates. We also show that early food intake, self-monitoring motivation, and attitude towards the goal are important factors. We then show that we can use our findings to predict goal achievement with an accuracy of 79% ROC AUC just 7 days after a goal is set. Finally, we discuss how our findings could inform steps to improve goal achievement in self-tracking apps.

活动跟踪应用程序通常将目标作为其核心激励工具之一。这个工具有两个关键组成部分:设定一个目标,然后实现这个目标。尽管它在许多著名的自我跟踪应用程序的功能中起着至关重要的作用,但对自我跟踪应用程序的目标设定和成就方面的研究相对较少。在这里,我们探讨了这个问题,调查了一个特定的目标设定和实现过程,这个过程是广泛的,记录的,对应用程序和用户的成功都至关重要:MyFitnessPal中的减肥目标。我们提出了一项针对140万用户和减肥目标的大规模研究,允许对人们如何设定和实现目标进行前所未有的详细观察。我们发现,即使是很难实现的长期目标,前7天内的行为也预示着那些最终实现目标的人,也就是说,那些至少减掉了原定体重的人,以及那些没有减掉的人。例如,一些研究人员认为,早期大量减肥是不可持续的,但它会导致更高的目标完成率。我们还表明,早期食物摄入、自我监控动机和对目标的态度是重要因素。然后我们表明,我们可以使用我们的发现来预测目标实现,目标设定后仅7天的ROC AUC准确率为79%。最后,我们讨论了我们的发现如何为自我跟踪应用程序提高目标实现的步骤提供信息。
{"title":"Goal-setting And Achievement In Activity Tracking Apps: A Case Study Of MyFitnessPal.","authors":"Mitchell L Gordon,&nbsp;Tim Althoff,&nbsp;Jure Leskovec","doi":"10.1145/3308558.3313432","DOIUrl":"https://doi.org/10.1145/3308558.3313432","url":null,"abstract":"<p><p>Activity tracking apps often make use of goals as one of their core motivational tools. There are two critical components to this tool: <i>setting</i> a goal, and subsequently <i>achieving</i> that goal. Despite its crucial role in how a number of prominent self-tracking apps function, there has been relatively little investigation of the goal-setting and achievement aspects of self-tracking apps. Here we explore this issue, investigating a particular goal setting and achievement process that is extensive, recorded, and crucial for both the app and its users' success: weight loss goals in MyFitnessPal. We present a large-scale study of 1.4 million users and weight loss goals, allowing for an unprecedented detailed view of how people set and achieve their goals. We find that, even for difficult long-term goals, behavior within the first 7 days predicts those who ultimately achieve their goals, that is, those who lose at least as much weight as they set out to, and those who do not. For instance, high amounts of early weight loss, which some researchers have classified as unsustainable, leads to higher goal achievement rates. We also show that early food intake, self-monitoring motivation, and attitude towards the goal are important factors. We then show that we can use our findings to predict goal achievement with an accuracy of 79% ROC AUC just 7 days after a goal is set. Finally, we discuss how our findings could inform steps to improve goal achievement in self-tracking apps.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2019 ","pages":"571-582"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3308558.3313432","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37902344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Modeling Interdependent and Periodic Real-World Action Sequences. 模拟现实世界中相互依存的周期性动作序列。
Takeshi Kurashima, Tim Althoff, Jure Leskovec

Mobile health applications, including those that track activities such as exercise, sleep, and diet, are becoming widely used. Accurately predicting human actions in the real world is essential for targeted recommendations that could improve our health and for personalization of these applications. However, making such predictions is extremely difficult due to the complexities of human behavior, which consists of a large number of potential actions that vary over time, depend on each other, and are periodic. Previous work has not jointly modeled these dynamics and has largely focused on item consumption patterns instead of broader types of behaviors such as eating, commuting or exercising. In this work, we develop a novel statistical model, called TIPAS, for Time-varying, Interdependent, and Periodic Action Sequences. Our approach is based on personalized, multivariate temporal point processes that model time-varying action propensities through a mixture of Gaussian intensities. Our model captures short-term and long-term periodic interdependencies between actions through Hawkes process-based self-excitations. We evaluate our approach on two activity logging datasets comprising 12 million real-world actions (e.g., eating, sleep, and exercise) taken by 20 thousand users over 17 months. We demonstrate that our approach allows us to make successful predictions of future user actions and their timing. Specifically, TIPAS improves predictions of actions, and their timing, over existing methods across multiple datasets by up to 156%, and up to 37%, respectively. Performance improvements are particularly large for relatively rare and periodic actions such as walking and biking, improving over baselines by up to 256%. This demonstrates that explicit modeling of dependencies and periodicities in real-world behavior enables successful predictions of future actions, with implications for modeling human behavior, app personalization, and targeting of health interventions.

移动健康应用,包括那些跟踪运动、睡眠和饮食等活动的应用,正在被广泛使用。准确预测人类在现实世界中的行为,对于有针对性地推荐可改善我们健康的产品和个性化这些应用至关重要。然而,由于人类行为的复杂性,进行此类预测极为困难,因为人类行为由大量随时间变化、相互依赖且具有周期性的潜在行动组成。以前的工作没有对这些动态行为进行联合建模,而且主要集中在物品消费模式上,而不是更广泛的行为类型,如饮食、通勤或锻炼。在这项工作中,我们针对时变、相互依赖和周期性的行动序列开发了一种名为 TIPAS 的新型统计模型。我们的方法基于个性化的多变量时间点过程,通过高斯强度混合物对时变动作倾向进行建模。我们的模型通过基于霍克斯过程的自激来捕捉行动之间的短期和长期周期性相互依存关系。我们在两个活动记录数据集上对我们的方法进行了评估,这两个数据集包含 2 万名用户在 17 个月内的 1200 万个真实世界中的动作(如吃饭、睡觉和锻炼)。结果表明,我们的方法可以成功预测用户未来的行为及其时间。具体来说,在多个数据集上,与现有方法相比,TIPAS 对行动及其时间的预测分别提高了 156% 和 37%。对于步行和骑自行车等相对罕见的周期性行为,性能改进幅度尤其大,与基线相比,改进幅度高达 256%。这表明,对真实世界行为中的依赖性和周期性进行明确建模可成功预测未来行动,这对人类行为建模、应用个性化和有针对性的健康干预具有重要意义。
{"title":"Modeling Interdependent and Periodic Real-World Action Sequences.","authors":"Takeshi Kurashima, Tim Althoff, Jure Leskovec","doi":"10.1145/3178876.3186161","DOIUrl":"10.1145/3178876.3186161","url":null,"abstract":"<p><p>Mobile health applications, including those that track activities such as exercise, sleep, and diet, are becoming widely used. Accurately predicting human actions in the real world is essential for targeted recommendations that could improve our health and for personalization of these applications. However, making such predictions is extremely difficult due to the complexities of human behavior, which consists of a large number of potential actions that vary over time, depend on each other, and are periodic. Previous work has not jointly modeled these dynamics and has largely focused on item consumption patterns instead of broader types of behaviors such as eating, commuting or exercising. In this work, we develop a novel statistical model, called <i>TIPAS</i>, for Time-varying, Interdependent, and Periodic Action Sequences. Our approach is based on personalized, multivariate temporal point processes that model time-varying action propensities through a mixture of Gaussian intensities. Our model captures short-term and long-term periodic interdependencies between actions through Hawkes process-based self-excitations. We evaluate our approach on two activity logging datasets comprising 12 million real-world actions (<i>e.g.</i>, eating, sleep, and exercise) taken by 20 thousand users over 17 months. We demonstrate that our approach allows us to make successful predictions of future user actions and their timing. Specifically, TIPAS improves predictions of actions, and their timing, over existing methods across multiple datasets by up to 156%, and up to 37%, respectively. Performance improvements are particularly large for relatively rare and periodic actions such as walking and biking, improving over baselines by up to 256%. This demonstrates that explicit modeling of dependencies and periodicities in real-world behavior enables successful predictions of future actions, with implications for modeling human behavior, app personalization, and targeting of health interventions.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2018 ","pages":"803-812"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5959287/pdf/nihms958398.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36115485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs. Any-k:标记图中的Anytime Top-k树模式检索。
Xiaofeng Yang, Patrick K Nicholson, Deepak Ajwani, Mirek Riedewald, Wolfgang Gatterbauer, Alessandra Sala

Many problems in areas as diverse as recommendation systems, social network analysis, semantic search, and distributed root cause analysis can be modeled as pattern search on labeled graphs (also called "heterogeneous information networks" or HINs). Given a large graph and a query pattern with node and edge label constraints, a fundamental challenge is to find the top-k matches according to a ranking function over edge and node weights. For users, it is difficult to select value k. We therefore propose the novel notion of an any-k ranking algorithm: for a given time budget, return as many of the top-ranked results as possible. Then, given additional time, produce the next lower-ranked results quickly as well. It can be stopped anytime, but may have to continue until all results are returned. This paper focuses on acyclic patterns over arbitrary labeled graphs. We are interested in practical algorithms that effectively exploit (1) properties of heterogeneous networks, in particular selective constraints on labels, and (2) that the users often explore only a fraction of the top-ranked results. Our solution, KARPET, carefully integrates aggressive pruning that leverages the acyclic nature of the query, and incremental guided search. It enables us to prove strong non-trivial time and space guarantees, which is generally considered very hard for this type of graph search problem. Through experimental studies we show that KARPET achieves running times in the order of milliseconds for tree patterns on large networks with millions of nodes and edges.

推荐系统、社交网络分析、语义搜索和分布式根本原因分析等领域的许多问题都可以建模为标记图上的模式搜索(也称为“异构信息网络”或HIN)。给定一个大图和一个具有节点和边标签约束的查询模式,一个基本的挑战是根据边和节点权重的排序函数来找到前k个匹配。对于用户来说,很难选择值k。因此,我们提出了any-k排名算法的新概念:在给定的时间预算下,返回尽可能多的排名靠前的结果。然后,如果有额外的时间,也可以快速生成排名较低的下一个结果。它可以随时停止,但可能必须继续,直到返回所有结果。本文主要研究任意标记图上的非循环模式。我们感兴趣的是能够有效利用(1)异构网络的特性,特别是标签上的选择性约束,以及(2)用户通常只探索排名靠前的结果的一小部分的实用算法。我们的解决方案KARPET小心地集成了利用查询的非循环性质的主动修剪和增量引导搜索。它使我们能够证明强大的非平凡的时间和空间保证,这通常被认为对于这类图搜索问题非常困难。通过实验研究,我们发现KARPET在具有数百万节点和边缘的大型网络上实现了树模式的毫秒级运行时间。
{"title":"Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs.","authors":"Xiaofeng Yang,&nbsp;Patrick K Nicholson,&nbsp;Deepak Ajwani,&nbsp;Mirek Riedewald,&nbsp;Wolfgang Gatterbauer,&nbsp;Alessandra Sala","doi":"10.1145/3178876.3186115","DOIUrl":"10.1145/3178876.3186115","url":null,"abstract":"<p><p>Many problems in areas as diverse as recommendation systems, social network analysis, semantic search, and distributed root cause analysis can be modeled as pattern search on labeled graphs (also called \"heterogeneous information networks\" or HINs). Given a large graph and a query pattern with node and edge label constraints, a fundamental challenge is to find the top-<i>k</i> matches according to a ranking function over edge and node weights. For users, it is difficult to select value <i>k</i>. We therefore propose the novel notion of an <i>any-k ranking algorithm</i>: for a given time budget, return as many of the top-ranked results as possible. Then, given additional time, produce the next lower-ranked results quickly as well. It can be stopped anytime, but may have to continue until all results are returned. This paper focuses on acyclic patterns over arbitrary labeled graphs. We are interested in practical algorithms that effectively exploit (1) properties of heterogeneous networks, in particular selective constraints on labels, and (2) that the users often explore only a fraction of the top-ranked results. Our solution, KARPET, carefully integrates aggressive pruning that leverages the acyclic nature of the query, and incremental guided search. It enables us to prove strong non-trivial time and space guarantees, which is generally considered very hard for this type of graph search problem. Through experimental studies we show that KARPET achieves running times in the order of milliseconds for tree patterns on large networks with millions of nodes and edges.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2018 ","pages":"489-498"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3178876.3186115","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36308762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
I'll Be Back: On the Multiple Lives of Users of a Mobile Activity Tracking Application. 我会回来的:关于移动活动跟踪应用程序用户的多重生活。
Zhiyuan Lin, Tim Althoff, Jure Leskovec

Mobile health applications that track activities, such as exercise, sleep, and diet, are becoming widely used. While these activity tracking applications have the potential to improve our health, user engagement and retention are critical factors for their success. However, long-term user engagement patterns in real-world activity tracking applications are not yet well understood. Here we study user engagement patterns within a mobile physical activity tracking application consisting of 115 million logged activities taken by over a million users over 31 months. Specifically, we show that over 75% of users return and re-engage with the application after prolonged periods of inactivity, no matter the duration of the inactivity. We find a surprising result that the re-engagement usage patterns resemble those of the start of the initial engagement period, rather than being a simple continuation of the end of the initial engagement period. This evidence points to a conceptual model of multiple lives of user engagement, extending the prevalent single life view of user activity. We demonstrate that these multiple lives occur because the users have a variety of different primary intents or goals for using the app. These primary intents are associated with how long each life lasts and how likely the user is to re-engage for a new life. We find evidence for users being more likely to stop using the app once they achieved their primary intent or goal (e.g., weight loss). However, these users might return once their original intent resurfaces (e.g., wanting to lose newly gained weight). We discuss implications of the multiple life paradigm and propose a novel prediction task of predicting the number of lives of a user. Based on insights developed in this work, including a marker of improved primary intent performance, our prediction models achieve 71% ROC AUC. Overall, our research has implications for modeling user re-engagement in health activity tracking applications and has consequences for how notifications, recommendations as well as gamification can be used to increase engagement.

跟踪运动、睡眠和饮食等活动的移动健康应用程序正得到广泛使用。虽然这些活动跟踪应用程序有可能改善我们的健康状况,但用户参与度和留存率是它们成功的关键因素。然而,在现实世界的活动跟踪应用中,长期用户粘性模式还没有得到很好的理解。在这里,我们研究了一个移动体育活动跟踪应用程序中的用户参与模式,该应用程序由超过100万用户在31个月内进行的1.15亿次记录活动组成。具体来说,我们发现超过75%的用户在长时间不活跃后返回并重新参与应用程序,无论不活跃的时间有多长。我们发现了一个令人惊讶的结果,即重新粘性的使用模式类似于初始粘性阶段的开始,而不是初始粘性阶段结束的简单延续。这一证据指向了用户参与的多重生命的概念模型,扩展了流行的用户活动的单一生命观。我们证明,这些多重生命的发生是因为用户使用应用程序有各种不同的主要意图或目标。这些主要意图与每次生命持续的时间长短以及用户重新投入新生活的可能性有关。我们发现有证据表明,一旦用户实现了他们的主要意图或目标(例如,减肥),他们就更有可能停止使用该应用。然而,这些用户可能会返回一旦他们最初的意图重新出现(例如,想要减掉新增加的体重)。我们讨论了多重生命范式的含义,并提出了一种预测用户生命数的新预测任务。基于这项工作中开发的见解,包括改进的主要意图表现的标记,我们的预测模型达到了71%的ROC AUC。总的来说,我们的研究对健康活动跟踪应用程序的用户再参与建模有影响,并对如何使用通知、推荐和游戏化来提高用户参与度产生影响。
{"title":"I'll Be Back: On the Multiple Lives of Users of a Mobile Activity Tracking Application.","authors":"Zhiyuan Lin,&nbsp;Tim Althoff,&nbsp;Jure Leskovec","doi":"10.1145/3178876.3186062","DOIUrl":"https://doi.org/10.1145/3178876.3186062","url":null,"abstract":"<p><p>Mobile health applications that track activities, such as exercise, sleep, and diet, are becoming widely used. While these activity tracking applications have the potential to improve our health, user engagement and retention are critical factors for their success. However, long-term user engagement patterns in real-world activity tracking applications are not yet well understood. Here we study user engagement patterns within a mobile physical activity tracking application consisting of 115 million logged activities taken by over a million users over 31 months. Specifically, we show that over 75% of users return and re-engage with the application after prolonged periods of inactivity, no matter the duration of the inactivity. We find a surprising result that the re-engagement usage patterns resemble those of the <i>start</i> of the initial engagement period, rather than being a simple continuation of the <i>end</i> of the initial engagement period. This evidence points to a conceptual model of <i>multiple lives</i> of user engagement, extending the prevalent <i>single life</i> view of user activity. We demonstrate that these multiple lives occur because the users have a variety of different <i>primary intents or goals</i> for using the app. These primary intents are associated with how long each life lasts and how likely the user is to re-engage for a new life. We find evidence for users being more likely to stop using the app once they achieved their primary intent or goal (<i>e.g.</i>, weight loss). However, these users might return once their original intent resurfaces (<i>e.g.</i>, wanting to lose newly gained weight). We discuss implications of the <i>multiple life paradigm</i> and propose a novel prediction task of predicting the number of lives of a user. Based on insights developed in this work, including a marker of improved primary intent performance, our prediction models achieve 71% ROC AUC. Overall, our research has implications for modeling user re-engagement in health activity tracking applications and has consequences for how notifications, recommendations as well as gamification can be used to increase engagement.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2018 ","pages":"1501-1511"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3178876.3186062","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36115486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Modeling Individual Cyclic Variation in Human Behavior. 模拟人类行为的个体循环变化。
Emma Pierson, Tim Althoff, Jure Leskovec

Cycles are fundamental to human health and behavior. Examples include mood cycles, circadian rhythms, and the menstrual cycle. However, modeling cycles in time series data is challenging because in most cases the cycles are not labeled or directly observed and need to be inferred from multidimensional measurements taken over time. Here, we present Cyclic Hidden Markov Models (CyH-MMs) for detecting and modeling cycles in a collection of multidimensional heterogeneous time series data. In contrast to previous cycle modeling methods, CyHMMs deal with a number of challenges encountered in modeling real-world cycles: they can model multivariate data with both discrete and continuous dimensions; they explicitly model and are robust to missing data; and they can share information across individuals to accommodate variation both within and between individual time series. Experiments on synthetic and real-world health-tracking data demonstrate that CyHMMs infer cycle lengths more accurately than existing methods, with 58% lower error on simulated data and 63% lower error on real-world data compared to the best-performing baseline. CyHMMs can also perform functions which baselines cannot: they can model the progression of individual features/symptoms over the course of the cycle, identify the most variable features, and cluster individual time series into groups with distinct characteristics. Applying CyHMMs to two real-world health-tracking datasets-of human menstrual cycle symptoms and physical activity tracking data-yields important insights including which symptoms to expect at each point during the cycle. We also find that people fall into several groups with distinct cycle patterns, and that these groups differ along dimensions not provided to the model. For example, by modeling missing data in the menstrual cycles dataset, we are able to discover a medically relevant group of birth control users even though information on birth control is not given to the model.

周期是人类健康和行为的基础。例子包括情绪周期、昼夜节律和月经周期。然而,在时间序列数据中建模周期是具有挑战性的,因为在大多数情况下,周期没有被标记或直接观察到,需要从随时间推移的多维测量中推断出来。在这里,我们提出了循环隐马尔可夫模型(cyh - mm),用于检测和建模多维异构时间序列数据集合中的周期。与以前的周期建模方法相比,cyhmm处理了建模真实世界周期时遇到的许多挑战:它们可以对离散和连续维度的多变量数据进行建模;它们显式建模并且对缺失数据具有鲁棒性;它们可以在个体之间共享信息,以适应个体时间序列内部和个体时间序列之间的变化。在合成和真实的健康跟踪数据上进行的实验表明,cyhmm比现有方法更准确地推断周期长度,与性能最佳的基线相比,模拟数据的误差降低了58%,真实数据的误差降低了63%。cyhmm还可以执行基线无法完成的功能:它们可以模拟单个特征/症状在周期过程中的进展,识别最易变化的特征,并将单个时间序列聚类成具有不同特征的组。将cyhmm应用于两个现实世界的健康跟踪数据集——人类月经周期症状和身体活动跟踪数据——可以产生重要的见解,包括在周期的每个点预期出现哪些症状。我们还发现,人们分为几个具有不同循环模式的群体,这些群体在模型未提供的维度上存在差异。例如,通过对月经周期数据集中缺失的数据进行建模,我们能够发现与生育控制相关的医学组用户,即使模型没有提供有关生育控制的信息。
{"title":"Modeling Individual Cyclic Variation in Human Behavior.","authors":"Emma Pierson, Tim Althoff, Jure Leskovec","doi":"10.1145/3178876.3186052","DOIUrl":"10.1145/3178876.3186052","url":null,"abstract":"<p><p>Cycles are fundamental to human health and behavior. Examples include mood cycles, circadian rhythms, and the menstrual cycle. However, modeling cycles in time series data is challenging because in most cases the cycles are not labeled or directly observed and need to be inferred from multidimensional measurements taken over time. Here, we present <i>Cyclic Hidden Markov Models</i> (CyH-MMs) for detecting and modeling cycles in a collection of multidimensional heterogeneous time series data. In contrast to previous cycle modeling methods, CyHMMs deal with a number of challenges encountered in modeling real-world cycles: they can model multivariate data with both discrete and continuous dimensions; they explicitly model and are robust to missing data; and they can share information across individuals to accommodate variation both within and between individual time series. Experiments on synthetic and real-world health-tracking data demonstrate that CyHMMs infer cycle lengths more accurately than existing methods, with 58% lower error on simulated data and 63% lower error on real-world data compared to the best-performing baseline. CyHMMs can also perform functions which baselines cannot: they can model the progression of individual features/symptoms over the course of the cycle, identify the most variable features, and cluster individual time series into groups with distinct characteristics. Applying CyHMMs to two real-world health-tracking datasets-of human menstrual cycle symptoms and physical activity tracking data-yields important insights including which symptoms to expect at each point during the cycle. We also find that people fall into several groups with distinct cycle patterns, and that these groups differ along dimensions not provided to the model. For example, by modeling missing data in the menstrual cycles dataset, we are able to discover a medically relevant group of birth control users even though information on birth control is not given to the model.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2018 ","pages":"107-116"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3178876.3186052","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36115484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
PhLeGrA: Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data. PhLeGrA:生命科学关联开放数据网上的药理学图谱分析。
Maulik R Kamdar, Mark A Musen

Integrated approaches for pharmacology are required for the mechanism-based predictions of adverse drug reactions that manifest due to concomitant intake of multiple drugs. These approaches require the integration and analysis of biomedical data and knowledge from multiple, heterogeneous sources with varying schemas, entity notations, and formats. To tackle these integrative challenges, the Semantic Web community has published and linked several datasets in the Life Sciences Linked Open Data (LSLOD) cloud using established W3C standards. We present the PhLeGrA platform for Linked Graph Analytics in Pharmacology in this paper. Through query federation, we integrate four sources from the LSLOD cloud and extract a drug-reaction network, composed of distinct entities. We represent this graph as a hidden conditional random field (HCRF), a discriminative latent variable model that is used for structured output predictions. We calculate the underlying probability distributions in the drug-reaction HCRF using the datasets from the U.S. Food and Drug Administration's Adverse Event Reporting System. We predict the occurrence of 146 adverse reactions due to multiple drug intake with an AUROC statistic greater than 0.75. The PhLeGrA platform can be extended to incorporate other sources published using Semantic Web technologies, as well as to discover other types of pharmacological associations.

要对因同时服用多种药物而出现的药物不良反应进行基于机理的预测,就需要采用药理学综合方法。这些方法需要整合和分析来自多种异构来源的生物医学数据和知识,这些来源的模式、实体符号和格式各不相同。为了应对这些整合性挑战,语义网社区已在生命科学关联开放数据(LSLOD)云中发布了多个数据集,并利用已建立的万维网联盟(W3C)标准进行了链接。我们在本文中介绍了用于药理学关联图分析的 PhLeGrA 平台。通过查询联合,我们整合了来自 LSLOD 云的四个数据源,并提取了由不同实体组成的药物反应网络。我们将该图表示为隐藏条件随机场(HCRF),这是一种用于结构化输出预测的判别潜变量模型。我们利用美国食品和药物管理局不良事件报告系统的数据集计算药物反应 HCRF 的基本概率分布。我们预测了因多种药物摄入导致的 146 种不良反应的发生率,其 AUROC 统计量大于 0.75。PhLeGrA平台可以扩展到使用语义网技术发布的其他数据源,以及发现其他类型的药理关联。
{"title":"PhLeGrA: Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data.","authors":"Maulik R Kamdar, Mark A Musen","doi":"10.1145/3038912.3052692","DOIUrl":"10.1145/3038912.3052692","url":null,"abstract":"<p><p>Integrated approaches for pharmacology are required for the mechanism-based predictions of adverse drug reactions that manifest due to concomitant intake of multiple drugs. These approaches require the integration and analysis of biomedical data and knowledge from multiple, heterogeneous sources with varying schemas, entity notations, and formats. To tackle these integrative challenges, the Semantic Web community has published and linked several datasets in the Life Sciences Linked Open Data (LSLOD) cloud using established W3C standards. We present the PhLeGrA platform for Linked Graph Analytics in Pharmacology in this paper. Through query federation, we integrate four sources from the LSLOD cloud and extract a drug-reaction network, composed of distinct entities. We represent this graph as a hidden conditional random field (HCRF), a discriminative latent variable model that is used for structured output predictions. We calculate the underlying probability distributions in the drug-reaction HCRF using the datasets from the U.S. Food and Drug Administration's Adverse Event Reporting System. We predict the occurrence of 146 adverse reactions due to multiple drug intake with an AUROC statistic greater than 0.75. The PhLeGrA platform can be extended to incorporate other sources published using Semantic Web technologies, as well as to discover other types of pharmacological associations.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2017 ","pages":"321-329"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5824722/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35861874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How Gamification Affects Physical Activity: Large-scale Analysis of Walking Challenges in a Mobile Application. 游戏化如何影响身体活动:移动应用程序中行走挑战的大规模分析。
Ali Shameli, Tim Althoff, Amin Saberi, Jure Leskovec

Gamification represents an effective way to incentivize user behavior across a number of computing applications. However, despite the fact that physical activity is essential for a healthy lifestyle, surprisingly little is known about how gamification and in particular competitions shape human physical activity. Here we study how competitions affect physical activity. We focus on walking challenges in a mobile activity tracking application where multiple users compete over who takes the most steps over a predefined number of days. We synthesize our findings in a series of game and app design implications. In particular, we analyze nearly 2,500 physical activity competitions over a period of one year capturing more than 800,000 person days of activity tracking. We observe that during walking competitions, the average user increases physical activity by 23%. Furthermore, there are large increases in activity for both men and women across all ages, and weight status, and even for users that were previously fairly inactive. We also find that the composition of participants greatly affects the dynamics of the game. In particular, if highly unequal participants get matched to each other, then competition suffers and the overall effect on the physical activity drops significantly. Furthermore, competitions with an equal mix of both men and women are more effective in increasing the level of activities. We leverage these insights to develop a statistical model to predict whether or not a competition will be particularly engaging with significant accuracy. Our models can serve as a guideline to help design more engaging competitions that lead to most beneficial behavioral changes.

游戏化是在许多计算应用程序中激励用户行为的有效方法。然而,尽管体育活动对健康的生活方式至关重要,但令人惊讶的是,人们对游戏化,特别是比赛如何影响人类的体育活动知之甚少。在这里,我们研究竞争如何影响体育活动。我们专注于移动活动跟踪应用程序中的步行挑战,其中多个用户在预定义的天数内竞争谁走的步数最多。我们将这些发现整合到一系列游戏和应用设计中。特别是,我们分析了一年内近2500项体育活动比赛,捕获了超过80万人的活动跟踪。我们观察到,在竞走比赛中,普通用户的体力活动增加了23%。此外,在所有年龄段和体重状况下,男性和女性的活动都有大幅增加,甚至对于以前相当不活跃的用户也是如此。我们还发现,参与者的构成极大地影响了游戏的动态。特别是,如果高度不平等的参与者相互匹配,那么竞争就会受到影响,对身体活动的总体影响也会显著下降。此外,男女人数相等的比赛在提高活动水平方面更有效。我们利用这些见解来开发一个统计模型,以预测竞争是否会特别吸引人,并具有显著的准确性。我们的模型可以作为指导方针,帮助设计更具吸引力的竞赛,从而带来最有益的行为改变。
{"title":"How Gamification Affects Physical Activity: Large-scale Analysis of Walking Challenges in a Mobile Application.","authors":"Ali Shameli,&nbsp;Tim Althoff,&nbsp;Amin Saberi,&nbsp;Jure Leskovec","doi":"10.1145/3041021.3054172","DOIUrl":"https://doi.org/10.1145/3041021.3054172","url":null,"abstract":"<p><p>Gamification represents an effective way to incentivize user behavior across a number of computing applications. However, despite the fact that physical activity is essential for a healthy lifestyle, surprisingly little is known about how gamification and in particular competitions shape human physical activity. Here we study how competitions affect physical activity. We focus on walking challenges in a mobile activity tracking application where multiple users compete over who takes the most steps over a predefined number of days. We synthesize our findings in a series of game and app design implications. In particular, we analyze nearly 2,500 physical activity competitions over a period of one year capturing more than 800,000 person days of activity tracking. We observe that during walking competitions, the average user increases physical activity by 23%. Furthermore, there are large increases in activity for both men and women across all ages, and weight status, and even for users that were previously fairly inactive. We also find that the composition of participants greatly affects the dynamics of the game. In particular, if highly unequal participants get matched to each other, then competition suffers and the overall effect on the physical activity drops significantly. Furthermore, competitions with an equal mix of both men and women are more effective in increasing the level of activities. We leverage these insights to develop a statistical model to predict whether or not a competition will be particularly engaging with significant accuracy. Our models can serve as a guideline to help design more engaging competitions that lead to most beneficial behavioral changes.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2017 ","pages":"455-463"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3041021.3054172","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35583062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Cataloguing Treatments Discussed and Used in Online Autism Communities. 自闭症在线社区中讨论和使用的治疗方法编目。
Shaodian Zhang, Tian Kang, Lin Qiu, Weinan Zhang, Yong Yu, Noémie Elhadad

A large number of patients discuss treatments in online health communities (OHCs). One research question of interest to health researchers is whether treatments being discussed in OHCs are eventually used by community members in their real lives. In this paper, we rely on machine learning methods to automatically identify attributions of mentions of treatments from an online autism community. The context of our work is online autism communities, where parents exchange support for the care of their children with autism spectrum disorder. Our methods are able to distinguish discussions of treatments that are associated with patients, caregivers, and others, as well as identify whether a treatment is actually taken. We investigate treatments that are not just discussed but also used by patients according to two types of content analysis, cross-sectional and longitudinal. The treatments identified through our content analysis help create a catalogue of real-world treatments. This study results lay the foundation for future research to compare real-world drug usage with established clinical guidelines.

大量患者在在线健康社区(OHC)中讨论治疗方法。健康研究人员感兴趣的一个研究问题是,在线健康社区中讨论的治疗方法最终是否会被社区成员在现实生活中使用。在本文中,我们利用机器学习方法自动识别自闭症在线社区中提及治疗方法的归因。我们工作的背景是在线自闭症社区,家长们在这里交流对自闭症谱系障碍患儿的护理支持。我们的方法能够区分与患者、护理人员和其他人相关的治疗讨论,并识别治疗是否被实际采用。我们根据横向和纵向两类内容分析,调查了患者不仅讨论而且使用的治疗方法。通过内容分析确定的治疗方法有助于建立真实世界的治疗方法目录。这项研究结果为今后将真实世界的药物使用情况与既定临床指南进行比较的研究奠定了基础。
{"title":"Cataloguing Treatments Discussed and Used in Online Autism Communities.","authors":"Shaodian Zhang, Tian Kang, Lin Qiu, Weinan Zhang, Yong Yu, Noémie Elhadad","doi":"10.1145/3038912.3052661","DOIUrl":"10.1145/3038912.3052661","url":null,"abstract":"<p><p>A large number of patients discuss treatments in online health communities (OHCs). One research question of interest to health researchers is whether treatments being discussed in OHCs are eventually used by community members in their real lives. In this paper, we rely on machine learning methods to automatically identify attributions of mentions of treatments from an online autism community. The context of our work is online autism communities, where parents exchange support for the care of their children with autism spectrum disorder. Our methods are able to distinguish discussions of treatments that are associated with patients, caregivers, and others, as well as identify whether a treatment is actually taken. We investigate treatments that are not just discussed but also used by patients according to two types of content analysis, cross-sectional and longitudinal. The treatments identified through our content analysis help create a catalogue of real-world treatments. This study results lay the foundation for future research to compare real-world drug usage with established clinical guidelines.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2017 ","pages":"123-131"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5516208/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35192147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
"We make choices we think are going to save us": Debate and stance identification for online breast cancer CAM discussions. “我们做出我们认为会拯救我们的选择”:在线乳腺癌CAM讨论的辩论和立场识别。
Shaodian Zhang, Lin Qiu, Frank Chen, Weinan Zhang, Yong Yu, Noémie Elhadad

Patients discuss complementary and alternative medicine (CAM) in online health communities. Sometimes, patients' conflicting opinions toward CAM-related issues trigger debates in the community. The objectives of this paper are to identify such debates, identify controversial CAM therapies in a popular online breast cancer community, as well as patients' stances towards them. To scale our analysis, we trained a set of classifiers. We first constructed a supervised classifier based on a long short-term memory neural network (LSTM) stacked over a convolutional neural network (CNN) to detect automatically CAM-related debates from a popular breast cancer forum. Members' stances in these debates were also identified by a CNN-based classifier. Finally, posts automatically flagged as debates by the classifier were analyzed to explore which specific CAM therapies trigger debates more often than others. Our methods are able to detect CAM debates with F score of 77%, and identify stances with F score of 70%. The debate classifier identified about 1/6 of all CAM-related posts as debate. About 60% of CAM-related debate posts represent the supportive stance toward CAM usage. Qualitative analysis shows that some specific therapies, such as Gerson therapy and usage of laetrile, trigger debates frequently among members of the breast cancer community. This study demonstrates that neural networks can effectively locate debates on usage and effectiveness of controversial CAM therapies, and can help make sense of patients' opinions on such issues under dispute. As to CAM for breast cancer, perceptions of their effectiveness vary among patients. Many of the specific therapies trigger debates frequently and are worth more exploration in future work.

患者在网上健康社区讨论补充和替代医学(CAM)。有时,患者对cam相关问题的矛盾意见会引发社区的争论。本文的目的是确定这样的争论,确定有争议的CAM疗法在一个流行的在线乳腺癌社区,以及患者对他们的立场。为了扩展我们的分析,我们训练了一组分类器。我们首先构建了一个基于长短期记忆神经网络(LSTM)叠加在卷积神经网络(CNN)上的监督分类器,以自动检测一个流行的乳腺癌论坛上与cam相关的辩论。成员在这些辩论中的立场也被基于cnn的分类器识别出来。最后,对分类器自动标记为辩论的帖子进行分析,以探索哪种特定的CAM疗法比其他疗法更容易引发辩论。我们的方法能够检测到F分为77%的CAM辩论,并识别F分为70%的立场。辩论分类器将大约1/6的cam相关帖子识别为辩论。大约60%的与CAM相关的辩论帖子代表了对CAM使用的支持立场。定性分析表明,一些特定的治疗方法,如Gerson疗法和苦杏仁素的使用,经常引发乳腺癌社区成员的争论。本研究表明,神经网络可以有效地定位有争议的CAM疗法的使用和有效性的争论,并有助于理解患者对这些争议问题的看法。至于针对乳腺癌的CAM,患者对其有效性的看法各不相同。许多特定的治疗方法经常引发争论,值得在未来的工作中进行更多的探索。
{"title":"\"We make choices we think are going to save us\": Debate and stance identification for online breast cancer CAM discussions.","authors":"Shaodian Zhang,&nbsp;Lin Qiu,&nbsp;Frank Chen,&nbsp;Weinan Zhang,&nbsp;Yong Yu,&nbsp;Noémie Elhadad","doi":"10.1145/3041021.3055134","DOIUrl":"https://doi.org/10.1145/3041021.3055134","url":null,"abstract":"<p><p>Patients discuss complementary and alternative medicine (CAM) in online health communities. Sometimes, patients' conflicting opinions toward CAM-related issues trigger debates in the community. The objectives of this paper are to identify such debates, identify controversial CAM therapies in a popular online breast cancer community, as well as patients' stances towards them. To scale our analysis, we trained a set of classifiers. We first constructed a supervised classifier based on a long short-term memory neural network (LSTM) stacked over a convolutional neural network (CNN) to detect automatically CAM-related debates from a popular breast cancer forum. Members' stances in these debates were also identified by a CNN-based classifier. Finally, posts automatically flagged as debates by the classifier were analyzed to explore which specific CAM therapies trigger debates more often than others. Our methods are able to detect CAM debates with F score of 77%, and identify stances with F score of 70%. The debate classifier identified about 1/6 of all CAM-related posts as debate. About 60% of CAM-related debate posts represent the supportive stance toward CAM usage. Qualitative analysis shows that some specific therapies, such as Gerson therapy and usage of laetrile, trigger debates frequently among members of the breast cancer community. This study demonstrates that neural networks can effectively locate debates on usage and effectiveness of controversial CAM therapies, and can help make sense of patients' opinions on such issues under dispute. As to CAM for breast cancer, perceptions of their effectiveness vary among patients. Many of the specific therapies trigger debates frequently and are worth more exploration in future work.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2017 ","pages":"1073-1081"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3041021.3055134","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35459049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Representing Documents via Latent Keyphrase Inference. 通过潜在关键词推理表示文档。
Jialu Liu, Xiang Ren, Jingbo Shang, Taylor Cassidy, Clare R Voss, Jiawei Han

Many text mining approaches adopt bag-of-words or n-grams models to represent documents. Looking beyond just the words, i.e., the explicit surface forms, in a document can improve a computer's understanding of text. Being aware of this, researchers have proposed concept-based models that rely on a human-curated knowledge base to incorporate other related concepts in the document representation. But these methods are not desirable when applied to vertical domains (e.g., literature, enterprise, etc.) due to low coverage of in-domain concepts in the general knowledge base and interference from out-of-domain concepts. In this paper, we propose a data-driven model named Latent Keyphrase Inference (LAKI) that represents documents with a vector of closely related domain keyphrases instead of single words or existing concepts in the knowledge base. We show that given a corpus of in-domain documents, topical content units can be learned for each domain keyphrase, which enables a computer to do smart inference to discover latent document keyphrases, going beyond just explicit mentions. Compared with the state-of-art document representation approaches, LAKI fills the gap between bag-of-words and concept-based models by using domain keyphrases as the basic representation unit. It removes dependency on a knowledge base while providing, with keyphrases, readily interpretable representations. When evaluated against 8 other methods on two text mining tasks over two corpora, LAKI outperformed all.

许多文本挖掘方法采用词袋模型或n-grams模型来表示文档。在文档中,超越单词,即明确的表面形式,可以提高计算机对文本的理解。意识到这一点,研究人员提出了基于概念的模型,该模型依赖于人类策划的知识库,将其他相关概念纳入文档表示中。但是这些方法在应用于垂直领域(如文学、企业等)时并不理想,因为一般知识库中域内概念的覆盖率很低,并且受到域外概念的干扰。在本文中,我们提出了一种数据驱动模型,称为潜在关键短语推理(LAKI),它用密切相关的领域关键短语向量来表示文档,而不是知识库中的单个单词或现有概念。我们表明,给定一个领域内文档的语料库,可以学习每个领域关键字的主题内容单元,这使计算机能够进行智能推理以发现潜在的文档关键字,而不仅仅是明确提及。与现有的文档表示方法相比,LAKI以领域关键短语为基本表示单元,填补了词袋模型与概念模型之间的空白。它消除了对知识库的依赖,同时通过关键字提供易于解释的表示。当在两个语料库上对两个文本挖掘任务与其他8种方法进行评估时,LAKI优于所有方法。
{"title":"Representing Documents via Latent Keyphrase Inference.","authors":"Jialu Liu,&nbsp;Xiang Ren,&nbsp;Jingbo Shang,&nbsp;Taylor Cassidy,&nbsp;Clare R Voss,&nbsp;Jiawei Han","doi":"10.1145/2872427.2883088","DOIUrl":"https://doi.org/10.1145/2872427.2883088","url":null,"abstract":"<p><p>Many text mining approaches adopt bag-of-words or <i>n</i>-grams models to represent documents. Looking beyond just the words, <i>i.e.</i>, the explicit surface forms, in a document can improve a computer's understanding of text. Being aware of this, researchers have proposed concept-based models that rely on a human-curated knowledge base to incorporate other related concepts in the document representation. But these methods are not desirable when applied to vertical domains (<i>e.g.</i>, literature, enterprise, <i>etc.</i>) due to low coverage of in-domain concepts in the general knowledge base and interference from out-of-domain concepts. In this paper, we propose a data-driven model named <i>Latent Keyphrase Inference</i> (<i>LAKI</i>) that represents documents with a vector of closely related domain keyphrases instead of single words or existing concepts in the knowledge base. We show that given a corpus of in-domain documents, topical content units can be learned for each domain keyphrase, which enables a computer to do smart inference to discover latent document keyphrases, going beyond just explicit mentions. Compared with the state-of-art document representation approaches, LAKI fills the gap between bag-of-words and concept-based models by using domain keyphrases as the basic representation unit. It removes dependency on a knowledge base while providing, with keyphrases, readily interpretable representations. When evaluated against 8 other methods on two text mining tasks over two corpora, LAKI outperformed all.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2016 ","pages":"1057-1067"},"PeriodicalIF":0.0,"publicationDate":"2016-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2872427.2883088","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34757346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
期刊
Proceedings of the ... International World-Wide Web Conference. International WWW Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1