首页 > 最新文献

Proceedings of the ... International World-Wide Web Conference. International WWW Conference最新文献

英文 中文
DPAR: Decoupled Graph Neural Networks with Node-Level Differential Privacy. 具有节点级差分隐私的解耦图神经网络。
Pub Date : 2024-05-01 Epub Date: 2024-05-13 DOI: 10.1145/3589334.3645531
Qiuchen Zhang, Hong Kyu Lee, Jing Ma, Jian Lou, Carl Yang, Li Xiong

Graph Neural Networks (GNNs) have achieved great success in learning with graph-structured data. Privacy concerns have also been raised for the trained models which could expose the sensitive information of graphs including both node features and the structure information. In this paper, we aim to achieve node-level differential privacy (DP) for training GNNs so that a node and its edges are protected. Node DP is inherently difficult for GNNs because all direct and multi-hop neighbors participate in the calculation of gradients for each node via layer-wise message passing and there is no bound on how many direct and multi-hop neighbors a node can have, so existing DP methods will result in high privacy cost or poor utility due to high node sensitivity. We propose a Decoupled GNN with Differentially Private Approximate Personalized PageRank (DPAR) for training GNNs with an enhanced privacy-utility tradeoff. The key idea is to decouple the feature projection and message passing via a DP PageRank algorithm which learns the structure information and uses the top-K neighbors determined by the PageRank for feature aggregation. By capturing the most important neighbors for each node and avoiding the layer-wise message passing, it bounds the node sensitivity and achieves improved privacy-utility tradeoff compared to layer-wise perturbation based methods. We theoretically analyze the node DP guarantee for the two processes combined together and empirically demonstrate better utilities of DPAR with the same level of node DP compared with state-of-the-art methods.

图神经网络(gnn)在学习图结构数据方面取得了巨大的成功。由于训练模型可能暴露图的敏感信息,包括节点特征和结构信息,因此也引起了隐私问题。在本文中,我们的目标是实现用于训练gnn的节点级差分隐私(DP),从而保护节点及其边缘。对于gnn来说,节点DP本身就很困难,因为所有的直接和多跳邻居都通过分层消息传递参与每个节点的梯度计算,并且一个节点可以有多少个直接和多跳邻居没有限制,因此现有的DP方法由于节点的高灵敏度而导致隐私成本高或实用性差。我们提出了一种具有差分私有近似个性化PageRank (DPAR)的解耦GNN,用于训练具有增强隐私效用权衡的GNN。关键思想是通过DP PageRank算法将特征投影和消息传递解耦,该算法学习结构信息并使用由PageRank确定的top-K邻居进行特征聚合。通过捕获每个节点最重要的邻居并避免分层消息传递,与基于分层扰动的方法相比,它限制了节点的敏感性,并实现了改进的隐私效用权衡。我们从理论上分析了两个过程结合在一起的节点DP保证,并实证证明了具有相同节点DP水平的DPAR与最先进的方法相比具有更好的效用。
{"title":"DPAR: Decoupled Graph Neural Networks with Node-Level Differential Privacy.","authors":"Qiuchen Zhang, Hong Kyu Lee, Jing Ma, Jian Lou, Carl Yang, Li Xiong","doi":"10.1145/3589334.3645531","DOIUrl":"10.1145/3589334.3645531","url":null,"abstract":"<p><p>Graph Neural Networks (GNNs) have achieved great success in learning with graph-structured data. Privacy concerns have also been raised for the trained models which could expose the sensitive information of graphs including both node features and the structure information. In this paper, we aim to achieve node-level differential privacy (DP) for training GNNs so that a node and its edges are protected. Node DP is inherently difficult for GNNs because all direct and multi-hop neighbors participate in the calculation of gradients for each node via layer-wise message passing and there is no bound on how many direct and multi-hop neighbors a node can have, so existing DP methods will result in high privacy cost or poor utility due to high node sensitivity. We propose a <b>D</b>ecoupled GNN with Differentially <b>P</b>rivate <b>A</b>pproximate Personalized Page<b>R</b>ank (DPAR) for training GNNs with an enhanced privacy-utility tradeoff. The key idea is to decouple the feature projection and message passing via a DP PageRank algorithm which learns the structure information and uses the top-<i>K</i> neighbors determined by the PageRank for feature aggregation. By capturing the most important neighbors for each node and avoiding the layer-wise message passing, it bounds the node sensitivity and achieves improved privacy-utility tradeoff compared to layer-wise perturbation based methods. We theoretically analyze the node DP guarantee for the two processes combined together and empirically demonstrate better utilities of DPAR with the same level of node DP compared with state-of-the-art methods.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2024 ","pages":"1170-1181"},"PeriodicalIF":0.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11660558/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142878919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Representations for Singular and Multi-Concept Relations for Biomedical Named Entity Normalization. 生物医学命名实体归一化中奇异和多概念关系的表示探讨。
Clint Cuffy, Evan French, Sophia Fehrmann, Bridget T McInnes

Since the rise of the COVID-19 pandemic, peer-reviewed biomedical repositories have experienced a surge in chemical and disease related queries. These queries have a wide variety of naming conventions and nomenclatures from trademark and generic, to chemical composition mentions. Normalizing or disambiguating these mentions within texts provides researchers and data-curators with more relevant articles returned by their search query. Named entity normalization aims to automate this disambiguation process by linking entity mentions onto their appropriate candidate concepts within a biomedical knowledge base or ontology. We explore several term embedding aggregation techniques in addition to how the term's context affects evaluation performance. We also evaluate our embedding approaches for normalizing term instances containing one or many relations within unstructured texts.

自2019冠状病毒病大流行以来,同行评审的生物医学知识库经历了化学和疾病相关查询的激增。这些查询具有各种各样的命名约定和命名法,从商标和通用到提到的化学成分。对文本中的这些提及进行规范化或消除歧义,可以为研究人员和数据管理员提供通过搜索查询返回的更多相关文章。命名实体规范化旨在通过将实体提及链接到生物医学知识库或本体中的适当候选概念,从而自动化此消歧过程。除了术语上下文如何影响评估性能外,我们还探讨了几种术语嵌入聚合技术。我们还评估了在非结构化文本中包含一个或多个关系的术语实例规范化的嵌入方法。
{"title":"Exploring Representations for Singular and Multi-Concept Relations for Biomedical Named Entity Normalization.","authors":"Clint Cuffy,&nbsp;Evan French,&nbsp;Sophia Fehrmann,&nbsp;Bridget T McInnes","doi":"10.1145/3487553.3524701","DOIUrl":"https://doi.org/10.1145/3487553.3524701","url":null,"abstract":"<p><p>Since the rise of the COVID-19 pandemic, peer-reviewed biomedical repositories have experienced a surge in chemical and disease related queries. These queries have a wide variety of naming conventions and nomenclatures from trademark and generic, to chemical composition mentions. Normalizing or disambiguating these mentions within texts provides researchers and data-curators with more relevant articles returned by their search query. Named entity normalization aims to automate this disambiguation process by linking entity mentions onto their appropriate candidate concepts within a biomedical knowledge base or ontology. We explore several term embedding aggregation techniques in addition to how the term's context affects evaluation performance. We also evaluate our embedding approaches for normalizing term instances containing one or many relations within unstructured texts.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2022 ","pages":"823-832"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/84/c6/nihms-1914411.PMC10353314.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9850563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Context-Enriched Learning Models for Aligning Biomedical Vocabularies at Scale in the UMLS Metathesaurus. 在UMLS元辞典中大规模对齐生物医学词汇的上下文丰富学习模型。
Pub Date : 2022-04-01 Epub Date: 2022-04-25 DOI: 10.1145/3485447.3511946
Vinh Nguyen, Hong Yung Yip, Goonmeet Bajaj, Thilini Wijesiriwardene, Vishesh Javangula, Srinivasan Parthasarathy, Amit Sheth, Olivier Bodenreider

The Unified Medical Language System (UMLS) Metathesaurus construction process mainly relies on lexical algorithms and manual expert curation for integrating over 200 biomedical vocabularies. A lexical-based learning model (LexLM) was developed to predict synonymy among Metathesaurus terms and largely outperforms a rule-based approach (RBA) that approximates the current construction process. However, the LexLM has the potential for being improved further because it only uses lexical information from the source vocabularies, while the RBA also takes advantage of contextual information. We investigate the role of multiple types of contextual information available to the UMLS editors, namely source synonymy (SS), source semantic group (SG), and source hierarchical relations (HR), for the UMLS vocabulary alignment (UVA) problem. In this paper, we develop multiple variants of context-enriched learning models (ConLMs) by adding to the LexLM the types of contextual information listed above. We represent these context types in context-enriched knowledge graphs (ConKGs) with four variants ConSS, ConSG, ConHR, and ConAll. We train these ConKG embeddings using seven KG embedding techniques. We create the ConLMs by concatenating the ConKG embedding vectors with the word embedding vectors from the LexLM. We evaluate the performance of the ConLMs using the UVA generalization test datasets with hundreds of millions of pairs. Our extensive experiments show a significant performance improvement from the ConLMs over the LexLM, namely +5.0% in precision (93.75%), +0.69% in recall (93.23%), +2.88% in F1 (93.49%) for the best ConLM. Our experiments also show that the ConAll variant including the three context types takes more time, but does not always perform better than other variants with a single context type. Finally, our experiments show that the pairs of terms with high lexical similarity benefit most from adding contextual information, namely +6.56% in precision (94.97%), +2.13% in recall (93.23%), +4.35% in F1 (94.09%) for the best ConLM. The pairs with lower degrees of lexical similarity also show performance improvement with +0.85% in F1 (96%) for low similarity and +1.31% in F1 (96.34%) for no similarity. These results demonstrate the importance of using contextual information in the UVA problem.

统一医学语言系统(UMLS)元词库的构建过程主要依靠词汇算法和人工专家策展对200多个生物医学词汇进行整合。开发了一种基于词汇的学习模型(LexLM)来预测metthesaurus术语之间的同义词,并且在很大程度上优于近似当前构建过程的基于规则的方法(RBA)。然而,LexLM有进一步改进的潜力,因为它只使用来自源词汇表的词汇信息,而RBA还利用上下文信息。我们研究了多种类型的上下文信息对UMLS编辑器的作用,即源同义词(SS)、源语义组(SG)和源层次关系(HR),用于UMLS词汇对齐(UVA)问题。在本文中,我们通过向LexLM添加上面列出的上下文信息类型来开发上下文丰富学习模型(conlm)的多个变体。我们用四种变体ConSS、cong、ConHR和ConAll在上下文丰富知识图(ConKGs)中表示这些上下文类型。我们使用七种KG嵌入技术来训练这些ConKG嵌入。我们通过连接ConKG嵌入向量和来自LexLM的单词嵌入向量来创建conlm。我们使用具有数亿对的UVA泛化测试数据集来评估conlm的性能。我们的大量实验表明,与LexLM相比,ConLM的性能有了显著的提高,即最佳ConLM的精度提高了5.0%(93.75%),召回率提高了0.69% (93.23%),F1提高了2.88%(93.49%)。我们的实验还表明,包含三种上下文类型的ConAll变体需要更多的时间,但并不总是比具有单一上下文类型的其他变体表现得更好。最后,我们的实验表明,高词汇相似度的词条对从添加上下文信息中获益最多,在最佳ConLM中,准确率提高了6.56%(94.97%),召回率提高了2.13% (93.23%),F1提高了4.35%(94.09%)。词汇相似度较低的配对也表现出性能的提高,低相似度的配对在F1中提高了+0.85%(96%),无相似度的配对在F1中提高了+1.31%(96.34%)。这些结果证明了在UVA问题中使用上下文信息的重要性。
{"title":"Context-Enriched Learning Models for Aligning Biomedical Vocabularies at Scale in the UMLS Metathesaurus.","authors":"Vinh Nguyen,&nbsp;Hong Yung Yip,&nbsp;Goonmeet Bajaj,&nbsp;Thilini Wijesiriwardene,&nbsp;Vishesh Javangula,&nbsp;Srinivasan Parthasarathy,&nbsp;Amit Sheth,&nbsp;Olivier Bodenreider","doi":"10.1145/3485447.3511946","DOIUrl":"https://doi.org/10.1145/3485447.3511946","url":null,"abstract":"<p><p>The Unified Medical Language System (UMLS) Metathesaurus construction process mainly relies on lexical algorithms and manual expert curation for integrating over 200 biomedical vocabularies. A lexical-based learning model (LexLM) was developed to predict synonymy among Metathesaurus terms and largely outperforms a rule-based approach (RBA) that approximates the current construction process. However, the LexLM has the potential for being improved further because it only uses lexical information from the source vocabularies, while the RBA also takes advantage of contextual information. We investigate the role of multiple types of contextual information available to the UMLS editors, namely source synonymy (SS), source semantic group (SG), and source hierarchical relations (HR), for the UMLS vocabulary alignment (UVA) problem. In this paper, we develop multiple variants of context-enriched learning models (ConLMs) by adding to the LexLM the types of contextual information listed above. We represent these context types in context-enriched knowledge graphs (ConKGs) with four variants ConSS, ConSG, ConHR, and ConAll. We train these ConKG embeddings using seven KG embedding techniques. We create the ConLMs by concatenating the ConKG embedding vectors with the word embedding vectors from the LexLM. We evaluate the performance of the ConLMs using the UVA generalization test datasets with hundreds of millions of pairs. Our extensive experiments show a significant performance improvement from the ConLMs over the LexLM, namely +5.0% in precision (93.75%), +0.69% in recall (93.23%), +2.88% in F1 (93.49%) for the best ConLM. Our experiments also show that the ConAll variant including the three context types takes more time, but does not always perform better than other variants with a single context type. Finally, our experiments show that the pairs of terms with high lexical similarity benefit most from adding contextual information, namely +6.56% in precision (94.97%), +2.13% in recall (93.23%), +4.35% in F1 (94.09%) for the best ConLM. The pairs with lower degrees of lexical similarity also show performance improvement with +0.85% in F1 (96%) for low similarity and +1.31% in F1 (96.34%) for no similarity. These results demonstrate the importance of using contextual information in the UVA problem.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":" ","pages":"1037-1046"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9455675/pdf/nihms-1833239.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40360036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Communication Efficient Federated Generalized Tensor Factorization for Collaborative Health Data Analytics. 用于协作式健康数据分析的通信效率联邦广义张量因式分解。
Jing Ma, Qiuchen Zhang, Jian Lou, Li Xiong, Joyce C Ho

Modern healthcare systems knitted by a web of entities (e.g., hospitals, clinics, pharmacy companies) are collecting a huge volume of healthcare data from a large number of individuals with various medical procedures, medications, diagnosis, and lab tests. To extract meaningful medical concepts (i.e., phenotypes) from such higher-arity relational healthcare data, tensor factorization has been proven to be an effective approach and received increasing research attention, due to their intrinsic capability to represent the high-dimensional data. Recently, federated learning offers a privacy-preserving paradigm for collaborative learning among different entities, which seemingly provides an ideal potential to further enhance the tensor factorization-based collaborative phenotyping to handle sensitive personal health data. However, existing attempts to federated tensor factorization come with various limitations, including restrictions to the classic tensor factorization, high communication cost and reduced accuracy. We propose a communication efficient federated generalized tensor factorization, which is flexible enough to choose from a variate of losses to best suit different types of data in practice. We design a three-level communication reduction strategy tailored to the generalized tensor factorization, which is able to reduce the uplink communication cost up to 99.90%. In addition, we theoretically prove that our algorithm does not compromise convergence speed despite the aggressive communication compression. Extensive experiments on two real-world electronics health record datasets demonstrate the efficiency improvements in terms of computation and communication cost.

由众多实体(如医院、诊所、药房公司等)组成的现代医疗保健系统正在收集来自大量个人的海量医疗保健数据,这些数据包含各种医疗程序、药物、诊断和实验室测试。要从这些高稀有度的关系型医疗数据中提取有意义的医疗概念(即表型),张量因子化已被证明是一种有效的方法,并因其表示高维数据的内在能力而受到越来越多的研究关注。最近,联合学习为不同实体之间的协作学习提供了一种保护隐私的范例,这似乎为进一步增强基于张量因子化的协作表型以处理敏感的个人健康数据提供了理想的潜力。然而,现有的联合张量因式分解尝试存在各种限制,包括对经典张量因式分解的限制、通信成本高和准确性降低。我们提出了一种通信效率高的联合广义张量因式分解法,它可以灵活地从各种损失中进行选择,以最适合实际中不同类型的数据。我们为广义张量因式分解设计了一种三级通信缩减策略,可将上行通信成本降低 99.90%。此外,我们还从理论上证明,尽管进行了积极的通信压缩,但我们的算法不会影响收敛速度。在两个真实世界的电子健康记录数据集上进行的广泛实验证明了计算和通信成本方面的效率改进。
{"title":"Communication Efficient Federated Generalized Tensor Factorization for Collaborative Health Data Analytics.","authors":"Jing Ma, Qiuchen Zhang, Jian Lou, Li Xiong, Joyce C Ho","doi":"10.1145/3442381.3449832","DOIUrl":"10.1145/3442381.3449832","url":null,"abstract":"<p><p>Modern healthcare systems knitted by a web of entities (e.g., hospitals, clinics, pharmacy companies) are collecting a huge volume of healthcare data from a large number of individuals with various medical procedures, medications, diagnosis, and lab tests. To extract meaningful medical concepts (i.e., phenotypes) from such higher-arity relational healthcare data, tensor factorization has been proven to be an effective approach and received increasing research attention, due to their intrinsic capability to represent the high-dimensional data. Recently, federated learning offers a privacy-preserving paradigm for collaborative learning among different entities, which seemingly provides an ideal potential to further enhance the tensor factorization-based collaborative phenotyping to handle sensitive personal health data. However, existing attempts to federated tensor factorization come with various limitations, including restrictions to the classic tensor factorization, high communication cost and reduced accuracy. We propose a <i>communication efficient</i> federated <i>generalized</i> tensor factorization, which is flexible enough to choose from a variate of losses to best suit different types of data in practice. We design a three-level communication reduction strategy tailored to the generalized tensor factorization, which is able to reduce the uplink communication cost up to 99.90%. In addition, we theoretically prove that our algorithm does not compromise convergence speed despite the aggressive communication compression. Extensive experiments on two real-world electronics health record datasets demonstrate the efficiency improvements in terms of computation and communication cost.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2021 ","pages":"171-182"},"PeriodicalIF":0.0,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8404412/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39388878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contrastive Lexical Diffusion Coefficient: Quantifying the Stickiness of the Ordinary. 对比词汇扩散系数:量化普通词汇的黏性。
Mohammadzaman Zamani, H Andrew Schwartz

Lexical phenomena, such as clusters of words, disseminate through social networks at different rates but most models of diffusion focus on the discrete adoption of new lexical phenomena (i.e. new topics or memes). It is possible much of lexical diffusion happens via the changing rates of existing word categories or concepts (those that are already being used, at least to some extent, regularly) rather than new ones. In this study we introduce a new metric, contrastive lexical diffusion (CLD) coefficient, which attempts to measure the degree to which ordinary language (here clusters of common words) catch on over friendship connections over time. For instance topics related to meeting and job are found to be sticky, while negative thinking and emotion, and global events, like 'school orientation' were found to be less sticky even though they change rates over time. We evaluate CLD coefficient over both quantitative and qualitative tests, studied over 6 years of language on Twitter. We find CLD predicts the spread of tweets and friendship connections, scores converge with human judgments of lexical diffusion (r=0.92), and CLD coefficients replicate across disjoint networks (r=0.85). Comparing CLD scores can help understand lexical diffusion: positive emotion words appear more diffusive than negative emotions, first-person plurals (we) score higher than other pronouns, and numbers and time appear non-contagious.

词汇现象,如词簇,在社会网络中以不同的速度传播,但大多数传播模型关注的是新词汇现象(即新话题或模因)的离散采用。词汇的扩散很可能是通过现有词类或概念(至少在某种程度上已经被经常使用的)的变化速率而不是通过新的词类或概念的变化速率发生的。在这项研究中,我们引入了一个新的度量标准,对比词汇扩散系数(CLD),它试图衡量随着时间的推移,普通语言(这里是常用词的集群)在友谊联系中流行的程度。例如,与会议和工作相关的话题被发现是粘性的,而消极的想法和情绪,以及像“学校方向”这样的全球事件被发现不那么粘性,尽管它们随着时间的推移而变化。我们通过定量和定性测试来评估CLD系数,研究了超过6年的Twitter语言。我们发现CLD预测推文和友谊关系的传播,分数与人类对词汇扩散的判断收敛(r=0.92),并且CLD系数在不相交的网络中复制(r=0.85)。比较CLD分数可以帮助理解词汇扩散:积极情绪词汇比消极情绪更具扩散性,第一人称复数(我们)比其他代词得分更高,数字和时间似乎没有传染性。
{"title":"Contrastive Lexical Diffusion Coefficient: Quantifying the Stickiness of the Ordinary.","authors":"Mohammadzaman Zamani,&nbsp;H Andrew Schwartz","doi":"10.1145/3442381.3449819","DOIUrl":"https://doi.org/10.1145/3442381.3449819","url":null,"abstract":"<p><p>Lexical phenomena, such as clusters of words, disseminate through social networks at different rates but most models of diffusion focus on the discrete adoption of new lexical phenomena (i.e. new topics or memes). It is possible much of lexical diffusion happens via the changing rates of existing word categories or concepts (those that are already being used, at least to some extent, regularly) rather than new ones. In this study we introduce a new metric, <i>contrastive lexical diffusion</i> (<i>CLD</i>) <i>coefficient</i>, which attempts to measure the degree to which ordinary language (here clusters of common words) catch on over friendship connections over time. For instance topics related to meeting and job are found to be sticky, while negative thinking and emotion, and global events, like 'school orientation' were found to be less sticky even though they change rates over time. We evaluate CLD coefficient over both quantitative and qualitative tests, studied over 6 years of language on Twitter. We find CLD predicts the spread of tweets and friendship connections, scores converge with human judgments of lexical diffusion (r=0.92), and CLD coefficients replicate across disjoint networks (r=0.85). Comparing CLD scores can help understand lexical diffusion: positive emotion words appear more diffusive than negative emotions, first-person plurals (we) score higher than other pronouns, and numbers and time appear non-contagious.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2021 ","pages":"565-574"},"PeriodicalIF":0.0,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3442381.3449819","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39251211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus. UMLS 元词库中生物医学词汇的大规模对齐。
Pub Date : 2021-04-01 Epub Date: 2021-04-19 DOI: 10.1145/3442381.3450128
Vinh Nguyen, Hong Yung Yip, Olivier Bodenreider

With 214 source vocabularies, the construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus terminology integration system is costly, time-consuming, and error-prone as it primarily relies on (1) lexical and semantic processing for suggesting groupings of synonymous terms, and (2) the expertise of UMLS editors for curating these synonymy predictions. This paper aims to improve the UMLS Metathesaurus construction process by developing a novel supervised learning approach for improving the task of suggesting synonymous pairs that can scale to the size and diversity of the UMLS source vocabularies. We evaluate this deep learning (DL) approach against a rule-based approach (RBA) that approximates the current UMLS Metathesaurus construction process. The key to the generalizability of our approach is the use of various degrees of lexical similarity in negative pairs during the training process. Our initial experiments demonstrate the strong performance across multiple datasets of our DL approach in terms of recall (91-92%), precision (88-99%), and F1 score (89-95%). Our DL approach largely outperforms the RBA method in recall (+23%), precision (+2.4%), and F1 score (+14.1%). This novel approach has great potential for improving the UMLS Metathesaurus construction process by providing better synonymy suggestions to the UMLS editors.

UMLS(统一医学语言系统)元词库术语集成系统有 214 个源词汇表,其构建和维护过程成本高、耗时长且容易出错,因为它主要依赖于(1)词汇和语义处理来建议同义词的分组,以及(2)UMLS 编辑的专业知识来整理这些同义预测。本文旨在改进 UMLS 元词库的构建过程,为此开发了一种新颖的监督学习方法,用于改进同义词对的建议任务,该方法可以扩展到 UMLS 源词库的规模和多样性。我们将这种深度学习(DL)方法与基于规则的方法(RBA)进行了对比评估,后者近似于当前的 UMLS 元词库构建过程。我们的方法之所以具有通用性,关键在于在训练过程中使用了不同程度的词性相似性负对。我们的初步实验表明,在召回率(91-92%)、精确率(88-99%)和 F1 分数(89-95%)方面,我们的 DL 方法在多个数据集上都表现出色。我们的 DL 方法在召回率(+23%)、精确率(+2.4%)和 F1 分数(+14.1%)方面大大优于 RBA 方法。通过为 UMLS 编辑提供更好的同义词建议,这种新方法在改进 UMLS 元词库构建过程方面具有巨大潜力。
{"title":"Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus.","authors":"Vinh Nguyen, Hong Yung Yip, Olivier Bodenreider","doi":"10.1145/3442381.3450128","DOIUrl":"10.1145/3442381.3450128","url":null,"abstract":"<p><p>With 214 source vocabularies, the construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus terminology integration system is costly, time-consuming, and error-prone as it primarily relies on (1) lexical and semantic processing for suggesting groupings of synonymous terms, and (2) the expertise of UMLS editors for curating these synonymy predictions. This paper aims to improve the UMLS Metathesaurus construction process by developing a novel supervised learning approach for improving the task of suggesting synonymous pairs that can scale to the size and diversity of the UMLS source vocabularies. We evaluate this deep learning (DL) approach against a rule-based approach (RBA) that approximates the current UMLS Metathesaurus construction process. The key to the generalizability of our approach is the use of various degrees of lexical similarity in negative pairs during the training process. Our initial experiments demonstrate the strong performance across multiple datasets of our DL approach in terms of recall (91-92%), precision (88-99%), and F1 score (89-95%). Our DL approach largely outperforms the RBA method in recall (+23%), precision (+2.4%), and F1 score (+14.1%). This novel approach has great potential for improving the UMLS Metathesaurus construction process by providing better synonymy suggestions to the UMLS editors.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2021 ","pages":"2672-2683"},"PeriodicalIF":0.0,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8434895/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39410327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Algorithms towards Network Intervention. 网络干预的高效算法。
Hui-Ju Hung, Chih-Ya Shen, Wang-Chien Lee, Zhen Lei, De-Nian Yang, Sy-Miin Chow

Research suggests that social relationships have substantial impacts on individuals' health outcomes. Network intervention, through careful planning, can assist a network of users to build healthy relationships. However, most previous work is not designed to assist such planning by carefully examining and improving multiple network characteristics. In this paper, we propose and evaluate algorithms that facilitate network intervention planning through simultaneous optimization of network degree, closeness, betweenness, and local clustering coefficient, under scenarios involving Network Intervention with Limited Degradation - for Single target (NILD-S) and Network Intervention with Limited Degradation - for Multiple targets (NILD-M). We prove that NILD-S and NILD-M are NP-hard and cannot be approximated within any ratio in polynomial time unless P=NP. We propose the Candidate Re-selection with Preserved Dependency (CRPD) algorithm for NILD-S, and the Objective-aware Intervention edge Selection and Adjustment (OISA) algorithm for NILD-M. Various pruning strategies are designed to boost the efficiency of the proposed algorithms. Extensive experiments on various real social networks collected from public schools and Web and an empirical study are conducted to show that CRPD and OISA outperform the baselines in both efficiency and effectiveness.

研究表明,社会关系对个人的健康状况有重大影响。通过精心规划,网络干预可以帮助用户网络建立健康的关系。然而,以往的大多数工作并不是通过仔细研究和改进多种网络特征来协助这种规划的。在本文中,我们提出并评估了一些算法,这些算法通过同时优化网络度、接近度、网络间度和本地聚类系数,在涉及 "有限退化的网络干预--针对单一目标(NILD-S)"和 "有限退化的网络干预--针对多个目标(NILD-M)"的情况下,促进网络干预规划。我们证明 NILD-S 和 NILD-M 是 NP 难,除非 P=NP,否则无法在任何比率内以多项式时间逼近。我们针对 NILD-S 提出了保留依赖性的候选重选(CRPD)算法,并针对 NILD-M 提出了目标感知干预边缘选择和调整(OISA)算法。我们设计了各种剪枝策略,以提高所提算法的效率。在从公立学校和网络收集的各种真实社交网络上进行了广泛的实验和实证研究,结果表明 CRPD 和 OISA 在效率和效果上都优于基线算法。
{"title":"Efficient Algorithms towards Network Intervention.","authors":"Hui-Ju Hung, Chih-Ya Shen, Wang-Chien Lee, Zhen Lei, De-Nian Yang, Sy-Miin Chow","doi":"10.1145/3366423.3380269","DOIUrl":"10.1145/3366423.3380269","url":null,"abstract":"<p><p>Research suggests that social relationships have substantial impacts on individuals' health outcomes. Network intervention, through careful planning, can assist a network of users to build healthy relationships. However, most previous work is not designed to assist such planning by carefully examining and improving multiple network characteristics. In this paper, we propose and evaluate algorithms that facilitate network intervention planning through simultaneous optimization of network <i>degree, closeness, betweenness,</i> and <i>local clustering coefficient,</i> under scenarios involving <i>Network Intervention with Limited Degradation - for Single target (NILD-S)</i> and <i>Network Intervention with Limited Degradation - for Multiple targets (NILD-M).</i> We prove that NILD-S and NILD-M are NP-hard and cannot be approximated within any ratio in polynomial time unless P=NP. We propose the <i>Candidate Re-selection with Preserved Dependency (CRPD)</i> algorithm for NILD-S, and the <i>Objective-aware Intervention edge Selection and Adjustment (OISA)</i> algorithm for NILD-M. Various pruning strategies are designed to boost the efficiency of the proposed algorithms. Extensive experiments on various real social networks collected from public schools and Web and an empirical study are conducted to show that CRPD and OISA outperform the baselines in both efficiency and effectiveness.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2020 ","pages":"2021-2031"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7368974/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38170365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed Tensor Decomposition for Large Scale Health Analytics. 用于大规模健康分析的分布式张量分解。
Huan He, Jette Henderson, Joyce C Ho

In the past few decades, there has been rapid growth in quantity and variety of healthcare data. These large sets of data are usually high dimensional (e.g. patients, their diagnoses, and medications to treat their diagnoses) and cannot be adequately represented as matrices. Thus, many existing algorithms can not analyze them. To accommodate these high dimensional data, tensor factorization, which can be viewed as a higher-order extension of methods like PCA, has attracted much attention and emerged as a promising solution. However, tensor factorization is a computationally expensive task, and existing methods developed to factor large tensors are not flexible enough for real-world situations. To address this scaling problem more efficiently, we introduce SGranite, a distributed, scalable, and sparse tensor factorization method fit through stochastic gradient descent. SGranite offers three contributions: (1) Scalability: it employs a block partitioning and parallel processing design and thus scales to large tensors, (2) Accuracy: we show that our method can achieve results faster without sacrificing the quality of the tensor decomposition, and (3) FlexibleConstraints: we show our approach can encompass various kinds of constraints including l2 norm, l1 norm, and logistic regularization. We demonstrate SGranite's capabilities in two real-world use cases. In the first, we use Google searches for flu-like symptoms to characterize and predict influenza patterns. In the second, we use SGranite to extract clinically interesting sets (i.e., phenotypes) of patients from electronic health records. Through these case studies, we show SGranite has the potential to be used to rapidly characterize, predict, and manage a large multimodal datasets, thereby promising a novel, data-driven solution that can benefit very large segments of the population.

在过去的几十年里,医疗保健数据的数量和种类都在快速增长。这些大数据集通常是高维的(例如,患者、他们的诊断和治疗他们诊断的药物),不能充分表示为矩阵。因此,许多现有的算法无法对其进行分析。为了适应这些高维数据,张量因子分解可以被视为PCA等方法的高阶扩展,它引起了人们的广泛关注,并成为一种很有前途的解决方案。然而,张量因子分解是一项计算成本高昂的任务,并且现有的对大张量进行因子分解的方法对于现实世界的情况来说不够灵活。为了更有效地解决这个缩放问题,我们引入了SGranite,这是一种通过随机梯度下降拟合的分布式、可缩放和稀疏张量分解方法。SGranite提供了三个贡献:(1)可扩展性:它采用了块划分和并行处理设计,因此可扩展到大张量;(2)准确性:我们表明,我们的方法可以在不牺牲张量分解质量的情况下更快地获得结果,以及后勤正规化。我们在两个真实世界的用例中展示了SGranite的功能。首先,我们使用谷歌搜索流感样症状来表征和预测流感模式。第二,我们使用SGranite从电子健康记录中提取患者的临床感兴趣的集合(即表型)。通过这些案例研究,我们表明SGranite有潜力用于快速表征、预测和管理大型多模式数据集,从而有望成为一种新的数据驱动解决方案,使很大一部分人群受益。
{"title":"Distributed Tensor Decomposition for Large Scale Health Analytics.","authors":"Huan He,&nbsp;Jette Henderson,&nbsp;Joyce C Ho","doi":"10.1145/3308558.3313548","DOIUrl":"10.1145/3308558.3313548","url":null,"abstract":"<p><p>In the past few decades, there has been rapid growth in quantity and variety of healthcare data. These large sets of data are usually high dimensional (e.g. patients, their diagnoses, and medications to treat their diagnoses) and cannot be adequately represented as matrices. Thus, many existing algorithms can not analyze them. To accommodate these high dimensional data, tensor factorization, which can be viewed as a higher-order extension of methods like PCA, has attracted much attention and emerged as a promising solution. However, tensor factorization is a computationally expensive task, and existing methods developed to factor large tensors are not flexible enough for real-world situations. To address this scaling problem more efficiently, we introduce SGranite, a distributed, scalable, and sparse tensor factorization method fit through stochastic gradient descent. SGranite offers three contributions: (1) Scalability: it employs a block partitioning and parallel processing design and thus scales to large tensors, (2) Accuracy: we show that our method can achieve results faster without sacrificing the quality of the tensor decomposition, and (3) FlexibleConstraints: we show our approach can encompass various kinds of constraints including l2 norm, l1 norm, and logistic regularization. We demonstrate SGranite's capabilities in two real-world use cases. In the first, we use Google searches for flu-like symptoms to characterize and predict influenza patterns. In the second, we use SGranite to extract clinically interesting sets (i.e., phenotypes) of patients from electronic health records. Through these case studies, we show SGranite has the potential to be used to rapidly characterize, predict, and manage a large multimodal datasets, thereby promising a novel, data-driven solution that can benefit very large segments of the population.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2019 ","pages":"659-669"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3308558.3313548","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37334831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Goal-setting And Achievement In Activity Tracking Apps: A Case Study Of MyFitnessPal. 运动追踪应用程序中的目标设定和成就:以MyFitnessPal为例
Mitchell L Gordon, Tim Althoff, Jure Leskovec

Activity tracking apps often make use of goals as one of their core motivational tools. There are two critical components to this tool: setting a goal, and subsequently achieving that goal. Despite its crucial role in how a number of prominent self-tracking apps function, there has been relatively little investigation of the goal-setting and achievement aspects of self-tracking apps. Here we explore this issue, investigating a particular goal setting and achievement process that is extensive, recorded, and crucial for both the app and its users' success: weight loss goals in MyFitnessPal. We present a large-scale study of 1.4 million users and weight loss goals, allowing for an unprecedented detailed view of how people set and achieve their goals. We find that, even for difficult long-term goals, behavior within the first 7 days predicts those who ultimately achieve their goals, that is, those who lose at least as much weight as they set out to, and those who do not. For instance, high amounts of early weight loss, which some researchers have classified as unsustainable, leads to higher goal achievement rates. We also show that early food intake, self-monitoring motivation, and attitude towards the goal are important factors. We then show that we can use our findings to predict goal achievement with an accuracy of 79% ROC AUC just 7 days after a goal is set. Finally, we discuss how our findings could inform steps to improve goal achievement in self-tracking apps.

活动跟踪应用程序通常将目标作为其核心激励工具之一。这个工具有两个关键组成部分:设定一个目标,然后实现这个目标。尽管它在许多著名的自我跟踪应用程序的功能中起着至关重要的作用,但对自我跟踪应用程序的目标设定和成就方面的研究相对较少。在这里,我们探讨了这个问题,调查了一个特定的目标设定和实现过程,这个过程是广泛的,记录的,对应用程序和用户的成功都至关重要:MyFitnessPal中的减肥目标。我们提出了一项针对140万用户和减肥目标的大规模研究,允许对人们如何设定和实现目标进行前所未有的详细观察。我们发现,即使是很难实现的长期目标,前7天内的行为也预示着那些最终实现目标的人,也就是说,那些至少减掉了原定体重的人,以及那些没有减掉的人。例如,一些研究人员认为,早期大量减肥是不可持续的,但它会导致更高的目标完成率。我们还表明,早期食物摄入、自我监控动机和对目标的态度是重要因素。然后我们表明,我们可以使用我们的发现来预测目标实现,目标设定后仅7天的ROC AUC准确率为79%。最后,我们讨论了我们的发现如何为自我跟踪应用程序提高目标实现的步骤提供信息。
{"title":"Goal-setting And Achievement In Activity Tracking Apps: A Case Study Of MyFitnessPal.","authors":"Mitchell L Gordon,&nbsp;Tim Althoff,&nbsp;Jure Leskovec","doi":"10.1145/3308558.3313432","DOIUrl":"https://doi.org/10.1145/3308558.3313432","url":null,"abstract":"<p><p>Activity tracking apps often make use of goals as one of their core motivational tools. There are two critical components to this tool: <i>setting</i> a goal, and subsequently <i>achieving</i> that goal. Despite its crucial role in how a number of prominent self-tracking apps function, there has been relatively little investigation of the goal-setting and achievement aspects of self-tracking apps. Here we explore this issue, investigating a particular goal setting and achievement process that is extensive, recorded, and crucial for both the app and its users' success: weight loss goals in MyFitnessPal. We present a large-scale study of 1.4 million users and weight loss goals, allowing for an unprecedented detailed view of how people set and achieve their goals. We find that, even for difficult long-term goals, behavior within the first 7 days predicts those who ultimately achieve their goals, that is, those who lose at least as much weight as they set out to, and those who do not. For instance, high amounts of early weight loss, which some researchers have classified as unsustainable, leads to higher goal achievement rates. We also show that early food intake, self-monitoring motivation, and attitude towards the goal are important factors. We then show that we can use our findings to predict goal achievement with an accuracy of 79% ROC AUC just 7 days after a goal is set. Finally, we discuss how our findings could inform steps to improve goal achievement in self-tracking apps.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2019 ","pages":"571-582"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3308558.3313432","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37902344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Modeling Interdependent and Periodic Real-World Action Sequences. 模拟现实世界中相互依存的周期性动作序列。
Takeshi Kurashima, Tim Althoff, Jure Leskovec

Mobile health applications, including those that track activities such as exercise, sleep, and diet, are becoming widely used. Accurately predicting human actions in the real world is essential for targeted recommendations that could improve our health and for personalization of these applications. However, making such predictions is extremely difficult due to the complexities of human behavior, which consists of a large number of potential actions that vary over time, depend on each other, and are periodic. Previous work has not jointly modeled these dynamics and has largely focused on item consumption patterns instead of broader types of behaviors such as eating, commuting or exercising. In this work, we develop a novel statistical model, called TIPAS, for Time-varying, Interdependent, and Periodic Action Sequences. Our approach is based on personalized, multivariate temporal point processes that model time-varying action propensities through a mixture of Gaussian intensities. Our model captures short-term and long-term periodic interdependencies between actions through Hawkes process-based self-excitations. We evaluate our approach on two activity logging datasets comprising 12 million real-world actions (e.g., eating, sleep, and exercise) taken by 20 thousand users over 17 months. We demonstrate that our approach allows us to make successful predictions of future user actions and their timing. Specifically, TIPAS improves predictions of actions, and their timing, over existing methods across multiple datasets by up to 156%, and up to 37%, respectively. Performance improvements are particularly large for relatively rare and periodic actions such as walking and biking, improving over baselines by up to 256%. This demonstrates that explicit modeling of dependencies and periodicities in real-world behavior enables successful predictions of future actions, with implications for modeling human behavior, app personalization, and targeting of health interventions.

移动健康应用,包括那些跟踪运动、睡眠和饮食等活动的应用,正在被广泛使用。准确预测人类在现实世界中的行为,对于有针对性地推荐可改善我们健康的产品和个性化这些应用至关重要。然而,由于人类行为的复杂性,进行此类预测极为困难,因为人类行为由大量随时间变化、相互依赖且具有周期性的潜在行动组成。以前的工作没有对这些动态行为进行联合建模,而且主要集中在物品消费模式上,而不是更广泛的行为类型,如饮食、通勤或锻炼。在这项工作中,我们针对时变、相互依赖和周期性的行动序列开发了一种名为 TIPAS 的新型统计模型。我们的方法基于个性化的多变量时间点过程,通过高斯强度混合物对时变动作倾向进行建模。我们的模型通过基于霍克斯过程的自激来捕捉行动之间的短期和长期周期性相互依存关系。我们在两个活动记录数据集上对我们的方法进行了评估,这两个数据集包含 2 万名用户在 17 个月内的 1200 万个真实世界中的动作(如吃饭、睡觉和锻炼)。结果表明,我们的方法可以成功预测用户未来的行为及其时间。具体来说,在多个数据集上,与现有方法相比,TIPAS 对行动及其时间的预测分别提高了 156% 和 37%。对于步行和骑自行车等相对罕见的周期性行为,性能改进幅度尤其大,与基线相比,改进幅度高达 256%。这表明,对真实世界行为中的依赖性和周期性进行明确建模可成功预测未来行动,这对人类行为建模、应用个性化和有针对性的健康干预具有重要意义。
{"title":"Modeling Interdependent and Periodic Real-World Action Sequences.","authors":"Takeshi Kurashima, Tim Althoff, Jure Leskovec","doi":"10.1145/3178876.3186161","DOIUrl":"10.1145/3178876.3186161","url":null,"abstract":"<p><p>Mobile health applications, including those that track activities such as exercise, sleep, and diet, are becoming widely used. Accurately predicting human actions in the real world is essential for targeted recommendations that could improve our health and for personalization of these applications. However, making such predictions is extremely difficult due to the complexities of human behavior, which consists of a large number of potential actions that vary over time, depend on each other, and are periodic. Previous work has not jointly modeled these dynamics and has largely focused on item consumption patterns instead of broader types of behaviors such as eating, commuting or exercising. In this work, we develop a novel statistical model, called <i>TIPAS</i>, for Time-varying, Interdependent, and Periodic Action Sequences. Our approach is based on personalized, multivariate temporal point processes that model time-varying action propensities through a mixture of Gaussian intensities. Our model captures short-term and long-term periodic interdependencies between actions through Hawkes process-based self-excitations. We evaluate our approach on two activity logging datasets comprising 12 million real-world actions (<i>e.g.</i>, eating, sleep, and exercise) taken by 20 thousand users over 17 months. We demonstrate that our approach allows us to make successful predictions of future user actions and their timing. Specifically, TIPAS improves predictions of actions, and their timing, over existing methods across multiple datasets by up to 156%, and up to 37%, respectively. Performance improvements are particularly large for relatively rare and periodic actions such as walking and biking, improving over baselines by up to 256%. This demonstrates that explicit modeling of dependencies and periodicities in real-world behavior enables successful predictions of future actions, with implications for modeling human behavior, app personalization, and targeting of health interventions.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2018 ","pages":"803-812"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5959287/pdf/nihms958398.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36115485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the ... International World-Wide Web Conference. International WWW Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1