首页 > 最新文献

Proceedings of The Web Conference 2020最新文献

英文 中文
PG2S+: Stack Distance Construction Using Popularity, Gap and Machine Learning PG2S+:使用流行度,差距和机器学习构建堆栈距离
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380176
Jiangwei Zhang, Y. Tay
Stack distance characterizes temporal locality of workloads and plays a vital role in cache analysis since the 1970s. However, exact stack distance calculation is too costly, and impractical for online use. Hence, much work was done to optimize the exact computation, or approximate it through sampling or modeling. This paper introduces a new approximation technique PG2S that is based on reference popularity and gap distance. This approximation is exact under the Independent Reference Model (IRM). The technique is further extended, using machine learning, to PG2S+ for non-IRM reference patterns. Extensive experiments show that PG2S+ is much more accurate and robust than other state-of-the-art algorithms for determining stack distance. PG2S+ is the first technique to exploit the strong correlation among reference popularity, gap distance and stack distance.
自20世纪70年代以来,堆栈距离表征了工作负载的时间局部性,在缓存分析中起着至关重要的作用。然而,精确的堆栈距离计算过于昂贵,并且不适合在线使用。因此,需要做大量的工作来优化精确的计算,或者通过采样或建模来近似计算。本文介绍了一种新的基于参考度和间隙距离的近似技术PG2S。这种近似在独立参考模型(IRM)下是精确的。该技术使用机器学习进一步扩展到PG2S+,用于非irm参考模式。大量的实验表明,PG2S+在确定堆栈距离方面比其他最先进的算法更加准确和稳健。PG2S+是第一个利用参考度、间隙距离和堆栈距离之间强相关性的技术。
{"title":"PG2S+: Stack Distance Construction Using Popularity, Gap and Machine Learning","authors":"Jiangwei Zhang, Y. Tay","doi":"10.1145/3366423.3380176","DOIUrl":"https://doi.org/10.1145/3366423.3380176","url":null,"abstract":"Stack distance characterizes temporal locality of workloads and plays a vital role in cache analysis since the 1970s. However, exact stack distance calculation is too costly, and impractical for online use. Hence, much work was done to optimize the exact computation, or approximate it through sampling or modeling. This paper introduces a new approximation technique PG2S that is based on reference popularity and gap distance. This approximation is exact under the Independent Reference Model (IRM). The technique is further extended, using machine learning, to PG2S+ for non-IRM reference patterns. Extensive experiments show that PG2S+ is much more accurate and robust than other state-of-the-art algorithms for determining stack distance. PG2S+ is the first technique to exploit the strong correlation among reference popularity, gap distance and stack distance.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83022829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Measurements, Analyses, and Insights on the Entire Ethereum Blockchain Network 整个以太坊区块链网络的测量、分析和见解
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380103
Xi Tong Lee, Arijit Khan, Sourav Sengupta, Yu-Han Ong, Xu Liu
Blockchains are increasingly becoming popular due to the prevalence of cryptocurrencies and decentralized applications. Ethereum is a distributed public blockchain network that focuses on running code (smart contracts) for decentralized applications. More simply, it is a platform for sharing information in a global state that cannot be manipulated or changed. Ethereum blockchain introduces a novel ecosystem of human users and autonomous agents (smart contracts). In this network, we are interested in all possible interactions: user-to-user, user-to-contract, contract-to-user, and contract-to-contract. This requires us to construct interaction networks from the entire Ethereum blockchain data, where vertices are accounts (users, contracts) and arcs denote interactions. Our analyses on the networks reveal new insights by combining information from the four networks. We perform an in-depth study of these networks based on several graph properties consisting of both local and global properties, discuss their similarities and differences with social networks and the Web, draw interesting conclusions, and highlight important, future research directions.
由于加密货币和去中心化应用的流行,区块链越来越受欢迎。以太坊是一个分布式公共区块链网络,专注于为分散应用程序运行代码(智能合约)。更简单地说,它是一个在无法操纵或改变的全球状态下共享信息的平台。以太坊区块链引入了一个由人类用户和自主代理(智能合约)组成的新型生态系统。在这个网络中,我们对所有可能的交互感兴趣:用户对用户、用户对合同、合同对用户和合同对合同。这需要我们从整个以太坊区块链数据构建交互网络,其中顶点是账户(用户,合约),弧线表示交互。我们对网络的分析通过结合来自四个网络的信息揭示了新的见解。我们对这些网络进行了深入的研究,这些研究基于由局部和全局属性组成的几个图属性,讨论了它们与社交网络和Web的异同,得出了有趣的结论,并强调了重要的未来研究方向。
{"title":"Measurements, Analyses, and Insights on the Entire Ethereum Blockchain Network","authors":"Xi Tong Lee, Arijit Khan, Sourav Sengupta, Yu-Han Ong, Xu Liu","doi":"10.1145/3366423.3380103","DOIUrl":"https://doi.org/10.1145/3366423.3380103","url":null,"abstract":"Blockchains are increasingly becoming popular due to the prevalence of cryptocurrencies and decentralized applications. Ethereum is a distributed public blockchain network that focuses on running code (smart contracts) for decentralized applications. More simply, it is a platform for sharing information in a global state that cannot be manipulated or changed. Ethereum blockchain introduces a novel ecosystem of human users and autonomous agents (smart contracts). In this network, we are interested in all possible interactions: user-to-user, user-to-contract, contract-to-user, and contract-to-contract. This requires us to construct interaction networks from the entire Ethereum blockchain data, where vertices are accounts (users, contracts) and arcs denote interactions. Our analyses on the networks reveal new insights by combining information from the four networks. We perform an in-depth study of these networks based on several graph properties consisting of both local and global properties, discuss their similarities and differences with social networks and the Web, draw interesting conclusions, and highlight important, future research directions.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"73 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81733198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
P-Simrank: Extending Simrank to Scale-Free Bipartite Networks p - simmrank:将simmrank扩展到无标度二部网络
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380081
Prasenjit Dey, Kunal Goel, Rahul Agrawal
The measure of similarity between nodes in a graph is a useful tool in many areas of computer science. SimRank, proposed by Jeh and Widom [7], is a classic measure of similarities of nodes in graph that has both theoretical and intuitive properties and has been extensively studied and used in many applications such as Query-Rewriting, link prediction, collaborative filtering and so on. Existing works based on Simrank primarily focus on preserving the microscopic structure, such as the second and third order proximity of the vertices, while the macroscopic scale-free property is largely ignored. Scale-free property is a critical property of any real-world web graphs where the vertex degrees follow a heavy-tailed distribution. In this paper, we introduce P-Simrank which extends the idea of Simrank to Scale-free bipartite networks. To study the efficacy of the proposed solution on a real world problem, we tested the same on the well known query-rewriting problem in sponsored search domain using bipartite click graph, similar to Simrank++ [1], which acts as our baseline. We show that Simrank++ produces sub-optimal similarity scores in case of bipartite graphs where degree distribution of vertices follow power-law. We also show how P-Simrank can be optimized for real-world large graphs. Finally, we experimentally evaluate P-Simrank algorithm against Simrank++, using actual click graphs obtained from Bing, and show that P-Simrank outperforms Simrank++ in variety of metrics.
图中节点之间的相似性度量在计算机科学的许多领域是一个有用的工具。由Jeh和Widom[7]提出的simmrank是一种经典的图中节点相似度度量,具有理论性和直观性,在查询重写、链接预测、协同过滤等许多应用中得到了广泛的研究和应用。现有的基于Simrank的工作主要集中在保留微观结构,如顶点的二阶和三阶接近性,而忽略了宏观的无标度性。无标度特性是任何现实世界中顶点度遵循重尾分布的网络图的一个关键特性。本文引入了p - simmrank,将simmrank的思想推广到无标度二部网络中。为了研究提出的解决方案在现实世界问题上的有效性,我们使用类似于simrank++[1]的二部点击图(bipartite click graph)对赞助搜索领域中众所周知的查询重写问题进行了相同的测试,该图作为我们的基线。我们证明simrank++在顶点的度分布遵循幂律的二部图的情况下产生次优的相似性分数。我们还展示了如何针对现实世界中的大型图形优化p - simmrank。最后,我们通过实验评估了p - simmrank算法与simmrank ++的对比,使用从必应获得的实际点击图,并表明p - simmrank在各种指标上优于simmrank ++。
{"title":"P-Simrank: Extending Simrank to Scale-Free Bipartite Networks","authors":"Prasenjit Dey, Kunal Goel, Rahul Agrawal","doi":"10.1145/3366423.3380081","DOIUrl":"https://doi.org/10.1145/3366423.3380081","url":null,"abstract":"The measure of similarity between nodes in a graph is a useful tool in many areas of computer science. SimRank, proposed by Jeh and Widom [7], is a classic measure of similarities of nodes in graph that has both theoretical and intuitive properties and has been extensively studied and used in many applications such as Query-Rewriting, link prediction, collaborative filtering and so on. Existing works based on Simrank primarily focus on preserving the microscopic structure, such as the second and third order proximity of the vertices, while the macroscopic scale-free property is largely ignored. Scale-free property is a critical property of any real-world web graphs where the vertex degrees follow a heavy-tailed distribution. In this paper, we introduce P-Simrank which extends the idea of Simrank to Scale-free bipartite networks. To study the efficacy of the proposed solution on a real world problem, we tested the same on the well known query-rewriting problem in sponsored search domain using bipartite click graph, similar to Simrank++ [1], which acts as our baseline. We show that Simrank++ produces sub-optimal similarity scores in case of bipartite graphs where degree distribution of vertices follow power-law. We also show how P-Simrank can be optimized for real-world large graphs. Finally, we experimentally evaluate P-Simrank algorithm against Simrank++, using actual click graphs obtained from Bing, and show that P-Simrank outperforms Simrank++ in variety of metrics.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"84 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77021093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Interpretable Complex Question Answering 可解释的复杂问题回答
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380764
Soumen Chakrabarti
We will review cross-community co-evolution of question answering (QA) with the advent of large-scale knowledge graphs (KGs), continuous representations of text and graphs, and deep sequence analysis. Early QA systems were information retrieval (IR) systems enhanced to extract named entity spans from high-scoring passages. Starting with WordNet, a series of structured curations of language and world knowledge, called KGs, enabled further improvements. Corpus is unstructured and messy to exploit for QA. If a question can be answered using the KG alone, it is attractive to ‘interpret’ the free-form question into a structured query, which is then executed on the structured KG. This process is called KGQA. Answers can be high-quality and explainable if the KG has an answer, but manual curation results in low coverage. KGs were soon found useful to harness corpus information. Named entity mention spans could be tagged with fine-grained types (e.g., scientist), or even specific entities (e.g., Einstein). The QA system can learn to decompose a query into functional parts, e.g., “which scientist” and “played the violin”. With increasing success of such systems, ambition grew to address multi-hop or multi-clause queries, e.g., “the father of the director of La La Land teaches at which university?” or “who directed an award-winning movie and is the son of a Princeton University professor?” Questions limited to simple path traversals in KGs have been encoded to a vector representation, which a decoder then uses to guide the KG traversal. Recently the corpus counterpart of such strategies has also been proposed. However, for general multi-clause queries that do not necessarily translate to paths, and seek to bind multiple variables to satisfy multiple clauses, or involve logic, comparison, aggregation and other arithmetic, neural programmer-interpreter systems have seen some success. Our key focus will be on identifying situations where manual introduction of structural bias is essential for accuracy, as against cases where sufficient data can get around distant or no supervision.
随着大规模知识图(KGs)、文本和图的连续表示以及深度序列分析的出现,我们将回顾问答(QA)的跨社区协同进化。早期的QA系统是信息检索(IR)系统,用于从高分段落中提取命名实体跨度。从WordNet开始,一系列结构化的语言和世界知识管理,称为KGs,使进一步的改进成为可能。语料库是非结构化的,难以用于QA。如果可以单独使用KG回答一个问题,那么将自由形式的问题“解释”为结构化查询是很有吸引力的,然后在结构化KG上执行该查询。这个过程被称为KGQA。如果KG有答案,答案可以是高质量的和可解释的,但是手动策展导致低覆盖率。kg很快被发现对利用语料库信息很有用。可以用细粒度类型(例如,科学家)甚至特定实体(例如,爱因斯坦)标记命名实体提及范围。QA系统可以学习将查询分解为功能部分,例如,“哪个科学家”和“拉小提琴”。随着这类系统的日益成功,人们越来越希望解决多跳或多句查询,例如,“《爱乐之城》导演的父亲在哪所大学任教?”或者“谁导演了一部获奖电影,还是普林斯顿大学教授的儿子?”在KG中仅限于简单路径遍历的问题被编码为向量表示,然后解码器使用它来指导KG遍历。最近也有人提出了这种策略的语料库对应。然而,对于一般的多子句查询,不一定要转换为路径,并且寻求绑定多个变量以满足多个子句,或者涉及逻辑、比较、聚合和其他算法,神经编程-解释器系统已经取得了一些成功。我们的重点将放在识别人工引入结构偏差对准确性至关重要的情况,而不是在足够的数据可以绕过远程或无监督的情况下。
{"title":"Interpretable Complex Question Answering","authors":"Soumen Chakrabarti","doi":"10.1145/3366423.3380764","DOIUrl":"https://doi.org/10.1145/3366423.3380764","url":null,"abstract":"We will review cross-community co-evolution of question answering (QA) with the advent of large-scale knowledge graphs (KGs), continuous representations of text and graphs, and deep sequence analysis. Early QA systems were information retrieval (IR) systems enhanced to extract named entity spans from high-scoring passages. Starting with WordNet, a series of structured curations of language and world knowledge, called KGs, enabled further improvements. Corpus is unstructured and messy to exploit for QA. If a question can be answered using the KG alone, it is attractive to ‘interpret’ the free-form question into a structured query, which is then executed on the structured KG. This process is called KGQA. Answers can be high-quality and explainable if the KG has an answer, but manual curation results in low coverage. KGs were soon found useful to harness corpus information. Named entity mention spans could be tagged with fine-grained types (e.g., scientist), or even specific entities (e.g., Einstein). The QA system can learn to decompose a query into functional parts, e.g., “which scientist” and “played the violin”. With increasing success of such systems, ambition grew to address multi-hop or multi-clause queries, e.g., “the father of the director of La La Land teaches at which university?” or “who directed an award-winning movie and is the son of a Princeton University professor?” Questions limited to simple path traversals in KGs have been encoded to a vector representation, which a decoder then uses to guide the KG traversal. Recently the corpus counterpart of such strategies has also been proposed. However, for general multi-clause queries that do not necessarily translate to paths, and seek to bind multiple variables to satisfy multiple clauses, or involve logic, comparison, aggregation and other arithmetic, neural programmer-interpreter systems have seen some success. Our key focus will be on identifying situations where manual introduction of structural bias is essential for accuracy, as against cases where sufficient data can get around distant or no supervision.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87923520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Privacy-preserving AI Services Through Data Decentralization 通过数据去中心化保护隐私的人工智能服务
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380106
Christian Meurisch, Bekir Bayrak, M. Mühlhäuser
User services increasingly base their actions on AI models, e.g., to offer personalized and proactive support. However, the underlying AI algorithms require a continuous stream of personal data—leading to privacy issues, as users typically have to share this data out of their territory. Current privacy-preserving concepts are either not applicable to such AI-based services or to the disadvantage of any party. This paper presents PrivAI, a new decentralized and privacy-by-design platform for overcoming the need for sharing user data to benefit from personalized AI services. In short, PrivAI complements existing approaches to personal data stores, but strictly enforces the confinement of raw user data. PrivAI further addresses the resulting challenges by (1) dividing AI algorithms into cloud-based general model training, subsequent local personalization, and community-based sharing of model updates for new users; by (2) loading confidential AI models into a trusted execution environment, and thus, protecting provider’s intellectual property (IP). Our experiments show the feasibility and effectiveness of PrivAI with comparable performance as currently-practiced approaches.
用户服务越来越多地基于人工智能模型,例如提供个性化和主动支持。然而,底层的人工智能算法需要连续的个人数据流——这导致了隐私问题,因为用户通常必须在自己的领域之外共享这些数据。目前的隐私保护概念要么不适用于这种基于人工智能的服务,要么对任何一方都不利。本文介绍了PrivAI,这是一个新的分散和隐私设计平台,用于克服共享用户数据的需求,从而从个性化人工智能服务中受益。简而言之,PrivAI补充了现有的个人数据存储方法,但严格执行对原始用户数据的限制。PrivAI进一步解决了由此带来的挑战:(1)将人工智能算法分为基于云的通用模型训练、随后的本地个性化和基于社区的新用户模型更新共享;通过(2)将机密AI模型加载到可信的执行环境中,从而保护提供商的知识产权(IP)。我们的实验证明了PrivAI的可行性和有效性,其性能与目前实践的方法相当。
{"title":"Privacy-preserving AI Services Through Data Decentralization","authors":"Christian Meurisch, Bekir Bayrak, M. Mühlhäuser","doi":"10.1145/3366423.3380106","DOIUrl":"https://doi.org/10.1145/3366423.3380106","url":null,"abstract":"User services increasingly base their actions on AI models, e.g., to offer personalized and proactive support. However, the underlying AI algorithms require a continuous stream of personal data—leading to privacy issues, as users typically have to share this data out of their territory. Current privacy-preserving concepts are either not applicable to such AI-based services or to the disadvantage of any party. This paper presents PrivAI, a new decentralized and privacy-by-design platform for overcoming the need for sharing user data to benefit from personalized AI services. In short, PrivAI complements existing approaches to personal data stores, but strictly enforces the confinement of raw user data. PrivAI further addresses the resulting challenges by (1) dividing AI algorithms into cloud-based general model training, subsequent local personalization, and community-based sharing of model updates for new users; by (2) loading confidential AI models into a trusted execution environment, and thus, protecting provider’s intellectual property (IP). Our experiments show the feasibility and effectiveness of PrivAI with comparable performance as currently-practiced approaches.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74247680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
A Category-Aware Deep Model for Successive POI Recommendation on Sparse Check-in Data 稀疏检入数据上连续POI推荐的类别感知深度模型
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380202
Fuqiang Yu, Li-zhen Cui, Wei Guo, Xudong Lu, Qingzhong Li, Hua Lu
As considerable amounts of POI check-in data have been accumulated, successive point-of-interest (POI) recommendation is increasingly popular. Existing successive POI recommendation methods only predict where user will go next, ignoring when this behavior will occur. In this work, we focus on predicting POIs that will be visited by users in the next 24 hours. As check-in data is very sparse, it is challenging to accurately capture user preferences in temporal patterns. To this end, we propose a category-aware deep model CatDM that incorporates POI category and geographical influence to reduce search space to overcome data sparsity. We design two deep encoders based on LSTM to model the time series data. The first encoder captures user preferences in POI categories, whereas the second exploits user preferences in POIs. Considering clock influence in the second encoder, we divide each user’s check-in history into several different time windows and develop a personalized attention mechanism for each window to facilitate CatDM to exploit temporal patterns. Moreover, to sort the candidate set, we consider four specific dependencies: user-POI, user-category, POI-time and POI-user current preferences. Extensive experiments are conducted on two large real datasets. The experimental results demonstrate that our CatDM outperforms the state-of-the-art models for successive POI recommendation on sparse check-in data.
随着大量的POI签入数据的积累,连续的兴趣点(POI)建议越来越受欢迎。现有的连续POI推荐方法只预测用户下一步将去哪里,而忽略了这种行为何时发生。在这项工作中,我们专注于预测用户在未来24小时内将访问的poi。由于签入数据非常稀疏,因此在时间模式中准确捕获用户偏好是一项挑战。为此,我们提出了一种包含POI类别和地理影响的类别感知深度模型CatDM,以减少搜索空间,克服数据稀疏性。我们设计了两个基于LSTM的深度编码器来对时间序列数据建模。第一个编码器捕获POI类别中的用户首选项,而第二个编码器利用POI中的用户首选项。考虑到第二个编码器的时钟影响,我们将每个用户的签到历史划分为几个不同的时间窗口,并为每个窗口开发个性化的注意力机制,以促进CatDM利用时间模式。此外,为了对候选集进行排序,我们考虑了四种特定的依赖关系:用户- poi、用户类别、poi时间和poi用户当前偏好。在两个大型真实数据集上进行了大量的实验。实验结果表明,我们的CatDM在稀疏签入数据上的连续POI推荐方面优于最先进的模型。
{"title":"A Category-Aware Deep Model for Successive POI Recommendation on Sparse Check-in Data","authors":"Fuqiang Yu, Li-zhen Cui, Wei Guo, Xudong Lu, Qingzhong Li, Hua Lu","doi":"10.1145/3366423.3380202","DOIUrl":"https://doi.org/10.1145/3366423.3380202","url":null,"abstract":"As considerable amounts of POI check-in data have been accumulated, successive point-of-interest (POI) recommendation is increasingly popular. Existing successive POI recommendation methods only predict where user will go next, ignoring when this behavior will occur. In this work, we focus on predicting POIs that will be visited by users in the next 24 hours. As check-in data is very sparse, it is challenging to accurately capture user preferences in temporal patterns. To this end, we propose a category-aware deep model CatDM that incorporates POI category and geographical influence to reduce search space to overcome data sparsity. We design two deep encoders based on LSTM to model the time series data. The first encoder captures user preferences in POI categories, whereas the second exploits user preferences in POIs. Considering clock influence in the second encoder, we divide each user’s check-in history into several different time windows and develop a personalized attention mechanism for each window to facilitate CatDM to exploit temporal patterns. Moreover, to sort the candidate set, we consider four specific dependencies: user-POI, user-category, POI-time and POI-user current preferences. Extensive experiments are conducted on two large real datasets. The experimental results demonstrate that our CatDM outperforms the state-of-the-art models for successive POI recommendation on sparse check-in data.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74289044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 82
Efficient Online Multi-Task Learning via Adaptive Kernel Selection 基于自适应核选择的高效在线多任务学习
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3379993
Peng Yang, P. Li
Conventional multi-task model restricts the task structure to be linearly related, which may not be suitable when data is linearly nonseparable. To remedy this issue, we propose a kernel algorithm for online multi-task classification, as the large approximation space provided by reproducing kernel Hilbert spaces often contains an accurate function. Specifically, it maintains a local-global Gaussian distribution over each task model that guides the direction and scale of parameter updates. Nonetheless, optimizing over this space is computationally expensive. Moreover, most multi-task learning methods require accessing to the entire training instances, which is luxury unavailable in the large-scale streaming learning scenario. To overcome this issue, we propose a randomized kernel sampling technique across multiple tasks. Instead of requiring all inputs’ labels, the proposed algorithm determines whether to query a label or not via considering the confidence from the related tasks over label prediction. Theoretically, the algorithm trained on actively sampled labels can achieve a comparable result with one learned on all labels. Empirically, the proposed algorithm is able to achieve promising learning efficacy, while reducing the computational complexity and labeling cost simultaneously.
传统的多任务模型将任务结构限制为线性相关,这可能不适用于数据线性不可分的情况。为了解决这个问题,我们提出了一个在线多任务分类的核算法,因为通过复制核希尔伯特空间提供的大近似空间通常包含一个精确的函数。具体来说,它在每个任务模型上维护一个局部全局高斯分布,指导参数更新的方向和规模。尽管如此,在这个空间上进行优化在计算上是昂贵的。此外,大多数多任务学习方法需要访问整个训练实例,这在大规模流学习场景中是不可用的。为了克服这个问题,我们提出了一种跨多个任务的随机核采样技术。该算法不需要所有输入的标签,而是通过考虑相关任务对标签预测的置信度来决定是否查询标签。从理论上讲,在主动采样标签上训练的算法可以获得与在所有标签上学习的算法相当的结果。经验表明,该算法在降低计算复杂度和标注成本的同时,取得了良好的学习效果。
{"title":"Efficient Online Multi-Task Learning via Adaptive Kernel Selection","authors":"Peng Yang, P. Li","doi":"10.1145/3366423.3379993","DOIUrl":"https://doi.org/10.1145/3366423.3379993","url":null,"abstract":"Conventional multi-task model restricts the task structure to be linearly related, which may not be suitable when data is linearly nonseparable. To remedy this issue, we propose a kernel algorithm for online multi-task classification, as the large approximation space provided by reproducing kernel Hilbert spaces often contains an accurate function. Specifically, it maintains a local-global Gaussian distribution over each task model that guides the direction and scale of parameter updates. Nonetheless, optimizing over this space is computationally expensive. Moreover, most multi-task learning methods require accessing to the entire training instances, which is luxury unavailable in the large-scale streaming learning scenario. To overcome this issue, we propose a randomized kernel sampling technique across multiple tasks. Instead of requiring all inputs’ labels, the proposed algorithm determines whether to query a label or not via considering the confidence from the related tasks over label prediction. Theoretically, the algorithm trained on actively sampled labels can achieve a comparable result with one learned on all labels. Empirically, the proposed algorithm is able to achieve promising learning efficacy, while reducing the computational complexity and labeling cost simultaneously.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89504862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient Implicit Unsupervised Text Hashing using Adversarial Autoencoder 使用对抗性自动编码器的高效隐式无监督文本哈希
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380150
Khoa D Doan, C. Reddy
Searching for documents with semantically similar content is a fundamental problem in the information retrieval domain with various challenges, primarily, in terms of efficiency and effectiveness. Despite the promise of modeling structured dependencies in documents, several existing text hashing methods lack an efficient mechanism to incorporate such vital information. Additionally, the desired characteristics of an ideal hash function, such as robustness to noise, low quantization error and bit balance/uncorrelation, are not effectively learned with existing methods. This is because of the requirement to either tune additional hyper-parameters or optimize these heuristically and explicitly constructed cost functions. In this paper, we propose a Denoising Adversarial Binary Autoencoder (DABA) model which presents a novel representation learning framework that captures structured representation of text documents in the learned hash function. Also, adversarial training provides an alternative direction to implicitly learn a hash function that captures all the desired characteristics of an ideal hash function. Essentially, DABA adopts a novel single-optimization adversarial training procedure that minimizes the Wasserstein distance in its primal domain to regularize the encoder’s output of either a recurrent neural network or a convolutional autoencoder. We empirically demonstrate the effectiveness of our proposed method in capturing the intrinsic semantic manifold of the related documents. The proposed method outperforms the current state-of-the-art shallow and deep unsupervised hashing methods for the document retrieval task on several prominent document collections.
搜索具有语义相似内容的文档是信息检索领域的一个基本问题,面临着各种挑战,主要是效率和有效性方面的挑战。尽管有希望对文档中的结构化依赖关系进行建模,但现有的几种文本散列方法缺乏有效的机制来合并此类重要信息。此外,理想哈希函数的理想特性,如对噪声的鲁棒性、低量化误差和比特平衡/不相关,不能通过现有方法有效地学习。这是因为需要调整额外的超参数或优化这些启发式和显式构造的成本函数。在本文中,我们提出了一种去噪对抗性二进制自编码器(DABA)模型,该模型提出了一种新的表示学习框架,可以在学习的哈希函数中捕获文本文档的结构化表示。此外,对抗性训练提供了一种替代方向,可以隐式学习捕获理想哈希函数的所有所需特征的哈希函数。从本质上讲,DABA采用了一种新颖的单优化对抗训练过程,该过程在其原始域内最小化Wasserstein距离,以正则化循环神经网络或卷积自编码器的编码器输出。我们通过实证证明了我们所提出的方法在捕获相关文档的内在语义歧义方面的有效性。本文提出的方法优于当前最先进的浅层和深层无监督散列方法,用于几个突出的文档集合的文档检索任务。
{"title":"Efficient Implicit Unsupervised Text Hashing using Adversarial Autoencoder","authors":"Khoa D Doan, C. Reddy","doi":"10.1145/3366423.3380150","DOIUrl":"https://doi.org/10.1145/3366423.3380150","url":null,"abstract":"Searching for documents with semantically similar content is a fundamental problem in the information retrieval domain with various challenges, primarily, in terms of efficiency and effectiveness. Despite the promise of modeling structured dependencies in documents, several existing text hashing methods lack an efficient mechanism to incorporate such vital information. Additionally, the desired characteristics of an ideal hash function, such as robustness to noise, low quantization error and bit balance/uncorrelation, are not effectively learned with existing methods. This is because of the requirement to either tune additional hyper-parameters or optimize these heuristically and explicitly constructed cost functions. In this paper, we propose a Denoising Adversarial Binary Autoencoder (DABA) model which presents a novel representation learning framework that captures structured representation of text documents in the learned hash function. Also, adversarial training provides an alternative direction to implicitly learn a hash function that captures all the desired characteristics of an ideal hash function. Essentially, DABA adopts a novel single-optimization adversarial training procedure that minimizes the Wasserstein distance in its primal domain to regularize the encoder’s output of either a recurrent neural network or a convolutional autoencoder. We empirically demonstrate the effectiveness of our proposed method in capturing the intrinsic semantic manifold of the related documents. The proposed method outperforms the current state-of-the-art shallow and deep unsupervised hashing methods for the document retrieval task on several prominent document collections.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81116063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Democratizing Content Creation and Dissemination through AI Technology 通过人工智能技术实现内容创作和传播的民主化
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3382669
Wei-Ying Ma
With the rise of mobile video, user-generated content, and social networks, there is a massive opportunity for disruptive innovations in the media and content industry. It is now a fast-changing landscape with rapid advances in AI-powered content creation, dissemination and interaction technologies. I believe the current trends are leading us towards a world where everyone is equally empowered to produce high-quality content in video, music, augmented reality or more – and to share their information, knowledge, and stories with a large global audience. This new AI- powered content platform can further lead to innovations in advertising, e-commerce, online education, and productivity. I will share the current research efforts at ByteDance connected to this emerging new platform through products such as Douyin and TikTok, and discuss the challenges and the direction of our future research.
随着移动视频、用户生成内容和社交网络的兴起,媒体和内容行业存在着巨大的颠覆性创新机会。随着人工智能内容创作、传播和互动技术的快速发展,这是一个快速变化的领域。我相信,当前的趋势正在引领我们走向这样一个世界:每个人都有同等的权力制作高质量的视频、音乐、增强现实等内容,并与全球广大受众分享他们的信息、知识和故事。这个新的人工智能内容平台可以进一步引领广告、电子商务、在线教育和生产力方面的创新。我将分享字节跳动目前通过抖音和抖音等产品与这个新兴平台相连的研究工作,并讨论我们未来研究的挑战和方向。
{"title":"Democratizing Content Creation and Dissemination through AI Technology","authors":"Wei-Ying Ma","doi":"10.1145/3366423.3382669","DOIUrl":"https://doi.org/10.1145/3366423.3382669","url":null,"abstract":"With the rise of mobile video, user-generated content, and social networks, there is a massive opportunity for disruptive innovations in the media and content industry. It is now a fast-changing landscape with rapid advances in AI-powered content creation, dissemination and interaction technologies. I believe the current trends are leading us towards a world where everyone is equally empowered to produce high-quality content in video, music, augmented reality or more – and to share their information, knowledge, and stories with a large global audience. This new AI- powered content platform can further lead to innovations in advertising, e-commerce, online education, and productivity. I will share the current research efforts at ByteDance connected to this emerging new platform through products such as Douyin and TikTok, and discuss the challenges and the direction of our future research.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"116 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80371635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Task-Oriented Genetic Activation for Large-Scale Complex Heterogeneous Graph Embedding 面向任务的大规模复杂异构图嵌入遗传激活
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380230
Zhuoren Jiang, Zheng Gao, Jinjiong Lan, Hongxia Yang, Yao Lu, Xiaozhong Liu
The recent success of deep graph embedding innovates the graphical information characterization methodologies. However, in real-world applications, such a method still struggles with the challenges of heterogeneity, scalability, and multiplex. To address these challenges, in this study, we propose a novel solution, Genetic hEterogeneous gRaph eMbedding (GERM), which enables flexible and efficient task-driven vertex embedding in a complex heterogeneous graph. Unlike prior efforts for this track of studies, we employ a task-oriented genetic activation strategy to efficiently generate the “Edge Type Activated Vector” (ETAV) over the edge types in the graph. The generated ETAV can not only reduce the incompatible noise and navigate the heterogeneous graph random walk at the graph-schema level, but also activate an optimized subgraph for efficient representation learning. By revealing the correlation between the graph structure and task information, the model interpretability can be enhanced as well. Meanwhile, an activated heterogeneous skip-gram framework is proposed to encapsulate both topological and task-specific information of a given heterogeneous graph. Through extensive experiments on both scholarly and e-commerce datasets, we demonstrate the efficacy and scalability of the proposed methods via various search/recommendation tasks. GERM can significantly reduces the running time and remove expert-intervention without sacrificing the performance (or even modestly improve) by comparing with baselines.
近年来深度图嵌入的成功创新了图形信息表征方法。然而,在实际应用程序中,这种方法仍然面临着异构性、可伸缩性和多路复用的挑战。为了解决这些挑战,在本研究中,我们提出了一种新的解决方案,遗传异构图嵌入(GERM),它可以在复杂的异构图中实现灵活高效的任务驱动顶点嵌入。与之前的研究不同,我们采用了一种面向任务的遗传激活策略来有效地在图中的边缘类型上生成“边缘类型激活向量”(ETAV)。生成的ETAV不仅可以减少不兼容噪声并在图-模式级别上导航异构图随机游走,而且还可以激活优化的子图以进行高效的表示学习。通过揭示图结构与任务信息之间的相关性,可以增强模型的可解释性。同时,提出了一个激活的异构跳图框架来封装给定异构图的拓扑信息和任务特定信息。通过在学术和电子商务数据集上的广泛实验,我们通过各种搜索/推荐任务证明了所提出方法的有效性和可扩展性。通过与基线进行比较,GERM可以显著减少运行时间并消除专家干预,而不会牺牲性能(甚至略微提高性能)。
{"title":"Task-Oriented Genetic Activation for Large-Scale Complex Heterogeneous Graph Embedding","authors":"Zhuoren Jiang, Zheng Gao, Jinjiong Lan, Hongxia Yang, Yao Lu, Xiaozhong Liu","doi":"10.1145/3366423.3380230","DOIUrl":"https://doi.org/10.1145/3366423.3380230","url":null,"abstract":"The recent success of deep graph embedding innovates the graphical information characterization methodologies. However, in real-world applications, such a method still struggles with the challenges of heterogeneity, scalability, and multiplex. To address these challenges, in this study, we propose a novel solution, Genetic hEterogeneous gRaph eMbedding (GERM), which enables flexible and efficient task-driven vertex embedding in a complex heterogeneous graph. Unlike prior efforts for this track of studies, we employ a task-oriented genetic activation strategy to efficiently generate the “Edge Type Activated Vector” (ETAV) over the edge types in the graph. The generated ETAV can not only reduce the incompatible noise and navigate the heterogeneous graph random walk at the graph-schema level, but also activate an optimized subgraph for efficient representation learning. By revealing the correlation between the graph structure and task information, the model interpretability can be enhanced as well. Meanwhile, an activated heterogeneous skip-gram framework is proposed to encapsulate both topological and task-specific information of a given heterogeneous graph. Through extensive experiments on both scholarly and e-commerce datasets, we demonstrate the efficacy and scalability of the proposed methods via various search/recommendation tasks. GERM can significantly reduces the running time and remove expert-intervention without sacrificing the performance (or even modestly improve) by comparing with baselines.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90469714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
Proceedings of The Web Conference 2020
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1