Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining最新文献

英文中文

STAR: A System for Ticket Analysis and Resolution STAR:票证分析和解析系统

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pub Date : 2017-08-13 DOI: 10.1145/3097983.3098190

Wubai Zhou, Wei Xue, Ramesh Baral, Qing Wang, Chunqiu Zeng, Tao Li, Jian Xu, Zheng Liu, L. Shwartz, G. Grabarnik

In large scale and complex IT service environments, a problematic incident is logged as a ticket and contains the ticket summary (system status and problem description). The system administrators log the step-wise resolution description when such tickets are resolved. The repeating service events are most likely resolved by inferring similar historical tickets. With the availability of reasonably large ticket datasets, we can have an automated system to recommend the best matching resolution for a given ticket summary. In this paper, we first identify the challenges in real-world ticket analysis and develop an integrated framework to efficiently handle those challenges. The framework first quantifies the quality of ticket resolutions using a regression model built on carefully designed features. The tickets, along with their quality scores obtained from the resolution quality quantification, are then used to train a deep neural network ranking model that outputs the matching scores of ticket summary and resolution pairs. This ranking model allows us to leverage the resolution quality in historical tickets when recommending resolutions for an incoming incident ticket. In addition, the feature vectors derived from the deep neural ranking model can be effectively used in other ticket analysis tasks, such as ticket classification and clustering. The proposed framework is extensively evaluated with a large real-world dataset.

在大规模和复杂的IT服务环境中，有问题的事件被记录为票据，并包含票据摘要(系统状态和问题描述)。系统管理员在解决此类票据时，记录逐级解析的描述信息。重复的服务事件很可能通过推断相似的历史票证来解决。有了相当大的票务数据集的可用性，我们可以有一个自动化的系统，为给定的票务摘要推荐最佳匹配分辨率。在本文中，我们首先确定了现实世界票证分析中的挑战，并开发了一个集成框架来有效地处理这些挑战。该框架首先使用建立在精心设计的特征上的回归模型来量化票据分辨率的质量。然后，将这些票证及其从分辨率质量量化中获得的质量分数用于训练一个深度神经网络排序模型，该模型输出票证摘要和分辨率对的匹配分数。这个排名模型允许我们在为传入事件票证推荐解决方案时利用历史票证中的解决方案质量。此外，从深度神经排序模型中得到的特征向量可以有效地用于其他票据分析任务，如票据分类和聚类。提出的框架被广泛地评估与一个大型的真实世界的数据集。

{"title":"STAR: A System for Ticket Analysis and Resolution","authors":"Wubai Zhou, Wei Xue, Ramesh Baral, Qing Wang, Chunqiu Zeng, Tao Li, Jian Xu, Zheng Liu, L. Shwartz, G. Grabarnik","doi":"10.1145/3097983.3098190","DOIUrl":"https://doi.org/10.1145/3097983.3098190","url":null,"abstract":"In large scale and complex IT service environments, a problematic incident is logged as a ticket and contains the ticket summary (system status and problem description). The system administrators log the step-wise resolution description when such tickets are resolved. The repeating service events are most likely resolved by inferring similar historical tickets. With the availability of reasonably large ticket datasets, we can have an automated system to recommend the best matching resolution for a given ticket summary. In this paper, we first identify the challenges in real-world ticket analysis and develop an integrated framework to efficiently handle those challenges. The framework first quantifies the quality of ticket resolutions using a regression model built on carefully designed features. The tickets, along with their quality scores obtained from the resolution quality quantification, are then used to train a deep neural network ranking model that outputs the matching scores of ticket summary and resolution pairs. This ranking model allows us to leverage the resolution quality in historical tickets when recommending resolutions for an incoming incident ticket. In addition, the feature vectors derived from the deep neural ranking model can be effectively used in other ticket analysis tasks, such as ticket classification and clustering. The proposed framework is extensively evaluated with a large real-world dataset.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115320971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

TFX: A TensorFlow-Based Production-Scale Machine Learning Platform TFX:基于tensorflow的生产规模机器学习平台

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pub Date : 2017-08-13 DOI: 10.1145/3097983.3098021

Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, M. Ispir, Vihan Jain, L. Koc, C. Koo, Lukasz Lew, Clemens Mewald, A. Modi, N. Polyzotis, Sukriti Ramesh, Sudip Roy, Steven Euijong Whang, M. Wicke, Jarek Wilkiewicz, Xin Zhang, Martin A. Zinkevich

Creating and maintaining a platform for reliably producing and deploying machine learning models requires careful orchestration of many components---a learner for generating models based on training data, modules for analyzing and validating both data as well as models, and finally infrastructure for serving models in production. This becomes particularly challenging when data changes over time and fresh models need to be produced continuously. Unfortunately, such orchestration is often done ad hoc using glue code and custom scripts developed by individual teams for specific use cases, leading to duplicated effort and fragile systems with high technical debt. We present TensorFlow Extended (TFX), a TensorFlow-based general-purpose machine learning platform implemented at Google. By integrating the aforementioned components into one platform, we were able to standardize the components, simplify the platform configuration, and reduce the time to production from the order of months to weeks, while providing platform stability that minimizes disruptions. We present the case study of one deployment of TFX in the Google Play app store, where the machine learning models are refreshed continuously as new data arrive. Deploying TFX led to reduced custom code, faster experiment cycles, and a 2% increase in app installs resulting from improved data and model analysis.

创建和维护一个用于可靠地生产和部署机器学习模型的平台，需要对许多组件进行仔细的编排——用于基于训练数据生成模型的学习器，用于分析和验证数据和模型的模块，以及用于在生产中服务模型的基础设施。当数据随时间变化并且需要不断生成新的模型时，这变得特别具有挑战性。不幸的是，这样的编排通常是使用由个别团队为特定用例开发的粘合代码和自定义脚本来临时完成的，这会导致重复的工作和具有高技术债务的脆弱系统。我们介绍TensorFlow Extended (TFX)，这是一个基于TensorFlow的通用机器学习平台，由Google实现。通过将上述组件集成到一个平台中，我们能够标准化组件，简化平台配置，并将生产时间从几个月减少到几周，同时提供平台稳定性，最大限度地减少中断。我们提出了在Google Play应用商店中部署TFX的案例研究，其中机器学习模型随着新数据的到来而不断刷新。部署TFX减少了自定义代码，加快了实验周期，通过改进数据和模型分析，应用安装量增加了2%。

{"title":"TFX: A TensorFlow-Based Production-Scale Machine Learning Platform","authors":"Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, M. Ispir, Vihan Jain, L. Koc, C. Koo, Lukasz Lew, Clemens Mewald, A. Modi, N. Polyzotis, Sukriti Ramesh, Sudip Roy, Steven Euijong Whang, M. Wicke, Jarek Wilkiewicz, Xin Zhang, Martin A. Zinkevich","doi":"10.1145/3097983.3098021","DOIUrl":"https://doi.org/10.1145/3097983.3098021","url":null,"abstract":"Creating and maintaining a platform for reliably producing and deploying machine learning models requires careful orchestration of many components---a learner for generating models based on training data, modules for analyzing and validating both data as well as models, and finally infrastructure for serving models in production. This becomes particularly challenging when data changes over time and fresh models need to be produced continuously. Unfortunately, such orchestration is often done ad hoc using glue code and custom scripts developed by individual teams for specific use cases, leading to duplicated effort and fragile systems with high technical debt. We present TensorFlow Extended (TFX), a TensorFlow-based general-purpose machine learning platform implemented at Google. By integrating the aforementioned components into one platform, we were able to standardize the components, simplify the platform configuration, and reduce the time to production from the order of months to weeks, while providing platform stability that minimizes disruptions. We present the case study of one deployment of TFX in the Google Play app store, where the machine learning models are refreshed continuously as new data arrive. Deploying TFX led to reduced custom code, faster experiment cycles, and a 2% increase in app installs resulting from improved data and model analysis.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134459681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 355

A Taxi Order Dispatch Model based On Combinatorial Optimization 基于组合优化的出租车订单调度模型

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pub Date : 2017-08-13 DOI: 10.1145/3097983.3098138

Lingyu Zhang, Tao Hu, Yue Min, Guobin Wu, Junying Zhang, Pengcheng Feng, Pinghua Gong, Jieping Ye

Taxi-booking apps have been very popular all over the world as they provide convenience such as fast response time to the users. The key component of a taxi-booking app is the dispatch system which aims to provide optimal matches between drivers and riders. Traditional dispatch systems sequentially dispatch taxis to riders and aim to maximize the driver acceptance rate for each individual order. However, the traditional systems may lead to a low global success rate, which degrades the rider experience when using the app. In this paper, we propose a novel system that attempts to optimally dispatch taxis to serve multiple bookings. The proposed system aims to maximize the global success rate, thus it optimizes the overall travel efficiency, leading to enhanced user experience. To further enhance users' experience, we also propose a method to predict destinations of a user once the taxi-booking APP is started. The proposed method employs the Bayesian framework to model the distribution of a user's destination based on his/her travel histories. We use rigorous A/B tests to compare our new taxi dispatch method with state-of-the-art models using data collected in Beijing. Experimental results show that the proposed method is significantly better than other state-of-the art models in terms of global success rate (increased from 80% to 84%). Moreover, we have also achieved significant improvement on other metrics such as user's waiting-time and pick-up distance. For our destination prediction algorithm, we show that our proposed model is superior to the baseline model by improving the top-3 accuracy from 89% to 93%. The proposed taxi dispatch and destination prediction algorithms are both deployed in our online systems and serve tens of millions of users everyday.

出租车预订应用程序在世界各地都很受欢迎，因为它们为用户提供了快速响应时间等便利。出租车预约应用的关键部分是调度系统，该系统旨在为司机和乘客提供最佳匹配。传统的调度系统按顺序将出租车分配给乘客，目的是最大化司机对每个订单的接受率。然而，传统的系统可能会导致较低的全局成功率，这降低了乘客使用应用程序时的体验。在本文中，我们提出了一个新的系统，试图优化调度出租车以服务多个预订。该系统旨在最大化全球成功率，从而优化整体出行效率，从而增强用户体验。为了进一步提升用户体验，我们还提出了一种方法来预测用户在打车APP启动后的目的地。该方法采用贝叶斯框架，根据用户的旅行历史对其目的地分布进行建模。我们使用严格的A/B测试，将我们的新出租车调度方法与北京收集的最先进的模型进行比较。实验结果表明，该方法在全局成功率方面明显优于现有模型(从80%提高到84%)。此外，我们在用户等待时间和取货距离等其他指标上也取得了显著的改进。对于我们的目的地预测算法，我们表明我们提出的模型优于基线模型，将前3名的准确率从89%提高到93%。所提出的出租车调度和目的地预测算法都部署在我们的在线系统中，每天为数千万用户服务。

{"title":"A Taxi Order Dispatch Model based On Combinatorial Optimization","authors":"Lingyu Zhang, Tao Hu, Yue Min, Guobin Wu, Junying Zhang, Pengcheng Feng, Pinghua Gong, Jieping Ye","doi":"10.1145/3097983.3098138","DOIUrl":"https://doi.org/10.1145/3097983.3098138","url":null,"abstract":"Taxi-booking apps have been very popular all over the world as they provide convenience such as fast response time to the users. The key component of a taxi-booking app is the dispatch system which aims to provide optimal matches between drivers and riders. Traditional dispatch systems sequentially dispatch taxis to riders and aim to maximize the driver acceptance rate for each individual order. However, the traditional systems may lead to a low global success rate, which degrades the rider experience when using the app. In this paper, we propose a novel system that attempts to optimally dispatch taxis to serve multiple bookings. The proposed system aims to maximize the global success rate, thus it optimizes the overall travel efficiency, leading to enhanced user experience. To further enhance users' experience, we also propose a method to predict destinations of a user once the taxi-booking APP is started. The proposed method employs the Bayesian framework to model the distribution of a user's destination based on his/her travel histories. We use rigorous A/B tests to compare our new taxi dispatch method with state-of-the-art models using data collected in Beijing. Experimental results show that the proposed method is significantly better than other state-of-the art models in terms of global success rate (increased from 80% to 84%). Moreover, we have also achieved significant improvement on other metrics such as user's waiting-time and pick-up distance. For our destination prediction algorithm, we show that our proposed model is superior to the baseline model by improving the top-3 accuracy from 89% to 93%. The proposed taxi dispatch and destination prediction algorithms are both deployed in our online systems and serve tens of millions of users everyday.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133104970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 180

Inferring the Strength of Social Ties: A Community-Driven Approach 推断社会关系的强度:社区驱动的方法

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pub Date : 2017-08-13 DOI: 10.1145/3097983.3098199

Polina Rozenshtein, Nikolaj Tatti, A. Gionis

Online social networks are growing and becoming denser.The social connections of a given person may have very high variability: from close friends and relatives to acquaintances to people who hardly know. Inferring the strength of social ties is an important ingredient for modeling the interaction of users in a network and understanding their behavior. Furthermore, the problem has applications in computational social science, viral marketing, and people recommendation. In this paper we study the problem of inferring the strength of social ties in a given network. Our work is motivated by a recent approach by Sintos et. al [24], which leverages the Strong Triadic Closure} STC principle, a hypothesis rooted in social psychology. To guide our inference process, in addition to the network structure, we also consider as input a collection of tight communities. Those are sets of vertices that we expect to be connected via strong ties. Such communities appear in different situations, e.g., when being part of a community implies a strong connection to one of the existing members. We consider two related problem formalizations that reflect the assumptions of our setting: small number of STC violations and strong-tie connectivity in the input communities. We show that both problem formulations are NP-hard. We also show that one problem formulation is hard to approximate, while for the second we develop an algorithm with approximation guarantee. We validate the proposed method on real-world datasets by comparing with baselines that optimize STC violations and community connectivity separately.

在线社交网络正在增长，并且变得越来越密集。一个人的社会关系可能有很大的可变性:从亲密的朋友和亲戚到熟人，再到几乎不认识的人。推断社会关系的强度是对网络中用户交互建模和理解其行为的重要组成部分。此外，该问题还应用于计算社会科学、病毒式营销和人际推荐。在本文中，我们研究了在给定网络中推断社会联系强度的问题。我们的工作受到了Sintos等人[24]最近的一种方法的启发，该方法利用了基于社会心理学的强三元闭合原理(STC)。为了指导我们的推理过程，除了网络结构外，我们还考虑一个紧密社区的集合作为输入。这些是我们期望通过强联系连接起来的顶点集合。这种社区出现在不同的情况下，例如，当成为社区的一部分意味着与现有成员之一有很强的联系时。我们考虑了两个相关的问题形式化，它们反映了我们设置的假设:输入社区中的少量STC违规和强连接连接。我们证明了这两个问题的表述都是np困难的。我们还证明了一个问题公式难以近似，而第二个问题我们开发了一个具有近似保证的算法。通过与分别优化STC违规和社区连通性的基线进行比较，我们在真实数据集上验证了所提出的方法。

{"title":"Inferring the Strength of Social Ties: A Community-Driven Approach","authors":"Polina Rozenshtein, Nikolaj Tatti, A. Gionis","doi":"10.1145/3097983.3098199","DOIUrl":"https://doi.org/10.1145/3097983.3098199","url":null,"abstract":"Online social networks are growing and becoming denser.The social connections of a given person may have very high variability: from close friends and relatives to acquaintances to people who hardly know. Inferring the strength of social ties is an important ingredient for modeling the interaction of users in a network and understanding their behavior. Furthermore, the problem has applications in computational social science, viral marketing, and people recommendation. In this paper we study the problem of inferring the strength of social ties in a given network. Our work is motivated by a recent approach by Sintos et. al [24], which leverages the Strong Triadic Closure} STC principle, a hypothesis rooted in social psychology. To guide our inference process, in addition to the network structure, we also consider as input a collection of tight communities. Those are sets of vertices that we expect to be connected via strong ties. Such communities appear in different situations, e.g., when being part of a community implies a strong connection to one of the existing members. We consider two related problem formalizations that reflect the assumptions of our setting: small number of STC violations and strong-tie connectivity in the input communities. We show that both problem formulations are NP-hard. We also show that one problem formulation is hard to approximate, while for the second we develop an algorithm with approximation guarantee. We validate the proposed method on real-world datasets by comparing with baselines that optimize STC violations and community connectivity separately.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114374181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

A Practical Exploration System for Search Advertising 一个实用的搜索广告探索系统

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pub Date : 2017-08-13 DOI: 10.1145/3097983.3098041

P. Shah, Ming Yang, Sachidanand Alle, A. Ratnaparkhi, B. Shahshahani, Rohit Chandra

In this paper, we describe an exploration system that was implemented by the search-advertising team of a prominent web-portal to address the cold ads problem. The cold ads problem refers to the situation where, when new ads are injected into the system by advertisers, the system is unable to assign an accurate quality to the ad (in our case, the click probability). As a consequence, the advertiser may suffer from low impression volumes for these cold ads, and the overall system may perform sub-optimally if the click probabilities for new ads are not learnt rapidly. We designed a new exploration system that was adapted to search advertising and the serving constraints of the system. In this paper, we define the problem, discuss the design details of the exploration system, new evaluation criteria, and present the performance metrics that were observed by us.

在本文中，我们描述了一个由一家著名门户网站的搜索广告团队实现的搜索系统，以解决冷广告问题。冷广告问题指的是，当广告商向系统注入新的广告时，系统无法为广告分配一个准确的质量(在我们的例子中是点击概率)。因此，广告客户可能会遭受这些冷广告的低印象量，如果不能快速了解新广告的点击概率，整个系统可能会表现不佳。我们设计了一个新的搜索系统，它适应了搜索广告和系统的服务约束。在本文中，我们定义了问题，讨论了勘探系统的设计细节，新的评估标准，并提出了我们观察到的性能指标。

引用次数: 11

Multi-view Learning over Retinal Thickness and Visual Sensitivity on Glaucomatous Eyes 青光眼视网膜厚度与视敏的多视点学习

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pub Date : 2017-08-13 DOI: 10.1145/3097983.3098194

Toshimitsu Uesaka, K. Morino, Hiroki Sugiura, Taichi Kiwaki, Hiroshi Murata, R. Asaoka, K. Yamanishi

Dense measurements of visual-field, which is necessary to detect glaucoma, is known as very costly and labor intensive. Recently, measurement of retinal-thickness can be less costly than measurement of visual-field. Thus, it is sincerely desired that the retinal-thickness could be transformed into visual-sensitivity data somehow. In this paper, we propose two novel methods to estimate the sensitivity of the visual-field with SITA-Standard mode 10-2 resolution using retinal-thickness data measured with optical coherence tomography (OCT). The first method called Affine-Structured Non-negative Matrix Factorization (ASNMF) which is able to cope with both the estimation of visual-field and the discovery of deep glaucoma knowledge. While, the second is based on Convolutional Neural Networks (CNNs) which demonstrates very high estimation performance. These methods are kinds of multi-view learning methods because they utilize visual-field and retinal thickness data simultaneously. We experimentally tested the performance of our methods from several perspectives. We found that ASNMF worked better for relatively small data size while CNNs did for relatively large data size. In addition, some clinical knowledge are discovered via ASNMF. To the best of our knowledge, this is the first paper to address the dense estimation of the visual-field based on the retinal-thickness data.

密集的视野测量是检测青光眼所必需的，是非常昂贵和劳动密集型的。最近，测量视网膜厚度的成本比测量视野的成本要低。因此，迫切希望视网膜厚度能够以某种方式转化为视觉灵敏度数据。本文提出了两种新的方法，利用光学相干断层扫描(OCT)测量的视网膜厚度数据来估计sita -标准模式10-2分辨率下的视野灵敏度。第一种方法称为仿射结构非负矩阵分解(ASNMF)，它能够同时处理视野的估计和深度青光眼知识的发现。第二种方法是基于卷积神经网络(cnn)，具有很高的估计性能。这些方法是一种多视图学习方法，因为它们同时利用了视野和视网膜厚度数据。我们从几个角度对我们的方法的性能进行了实验测试。我们发现ASNMF在相对较小的数据量下工作得更好，而cnn在相对较大的数据量下工作得更好。此外，一些临床知识是通过ASNMF发现的。据我们所知，这是第一篇解决基于视网膜厚度数据的视野密集估计的论文。

{"title":"Multi-view Learning over Retinal Thickness and Visual Sensitivity on Glaucomatous Eyes","authors":"Toshimitsu Uesaka, K. Morino, Hiroki Sugiura, Taichi Kiwaki, Hiroshi Murata, R. Asaoka, K. Yamanishi","doi":"10.1145/3097983.3098194","DOIUrl":"https://doi.org/10.1145/3097983.3098194","url":null,"abstract":"Dense measurements of visual-field, which is necessary to detect glaucoma, is known as very costly and labor intensive. Recently, measurement of retinal-thickness can be less costly than measurement of visual-field. Thus, it is sincerely desired that the retinal-thickness could be transformed into visual-sensitivity data somehow. In this paper, we propose two novel methods to estimate the sensitivity of the visual-field with SITA-Standard mode 10-2 resolution using retinal-thickness data measured with optical coherence tomography (OCT). The first method called Affine-Structured Non-negative Matrix Factorization (ASNMF) which is able to cope with both the estimation of visual-field and the discovery of deep glaucoma knowledge. While, the second is based on Convolutional Neural Networks (CNNs) which demonstrates very high estimation performance. These methods are kinds of multi-view learning methods because they utilize visual-field and retinal thickness data simultaneously. We experimentally tested the performance of our methods from several perspectives. We found that ASNMF worked better for relatively small data size while CNNs did for relatively large data size. In addition, some clinical knowledge are discovered via ASNMF. To the best of our knowledge, this is the first paper to address the dense estimation of the visual-field based on the retinal-thickness data.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122659621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Estimation of Recent Ancestral Origins of Individuals on a Large Scale 大规模个体近世祖先起源的估计

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pub Date : 2017-08-13 DOI: 10.1145/3097983.3098042

Ross E. Curtis, A. Girshick

The last ten years have seen an exponential growth of direct-to-consumer genomics. One popular feature of these tests is the report of a distant ancestral inference profile-a breakdown of the regions of the world where the test-taker's ancestors may have lived. While current methods and products generally focus on the more distant past (e.g., thousands of years ago), we have recently demonstrated that by leveraging network analysis tools such as community detection, more recent ancestry can be identified. However, using a network analysis tool like community detection on a large network with potentially millions of nodes is not feasible in a live production environment where hundreds or thousands of new genotypes are processed every day. In this study, we describe a classification method that leverages network features to assign individuals to communities in a large network corresponding to recent ancestry. We recently launched a beta version of this research as a new product feature at AncestryDNA.

过去十年，直接面向消费者的基因组学呈指数级增长。这些测试的一个流行特点是报告遥远的祖先推断概况——对世界上考生祖先可能生活过的地区进行分类。虽然当前的方法和产品通常关注更遥远的过去(例如，数千年前)，但我们最近证明，通过利用网络分析工具，如社区检测，可以识别更近的祖先。然而，在可能有数百万个节点的大型网络上使用社区检测之类的网络分析工具，在每天处理数百或数千个新基因型的实时生产环境中是不可用的。在这项研究中，我们描述了一种分类方法，该方法利用网络特征将个体分配到与最近祖先相对应的大型网络中的社区。我们最近在AncestryDNA推出了这项研究的测试版，作为一项新产品功能。

引用次数: 2

A Quasi-experimental Estimate of the Impact of P2P Transportation Platforms on Urban Consumer Patterns P2P交通平台对城市消费模式影响的准实验研究

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pub Date : 2017-08-13 DOI: 10.1145/3097983.3098058

Zhe Zhang, Beibei Li

With the pervasiveness of mobile technology and location-based computing, new forms of smart urban transportation, such as Uber & Lyft, have become increasingly popular. These new forms of urban infrastructure can influence individuals' movement frictions and patterns, in turn influencing local consumption patterns and the economic performance of local businesses. To gain insights about future impact of urban transportation changes, in this paper, we utilize a novel dataset and econometric analysis methods to present a quasi-experimental examination of how the emerging growth of peer-to-peer car sharing services may have affected local consumer mobility and consumption patterns.

随着移动技术和基于位置的计算的普及，新的智能城市交通形式，如Uber和Lyft，越来越受欢迎。这些新形式的城市基础设施可以影响个人的移动摩擦和模式，进而影响当地的消费模式和当地企业的经济业绩。为了深入了解城市交通变化对未来的影响，本文利用新颖的数据集和计量经济学分析方法，对新兴的点对点汽车共享服务的增长如何影响当地消费者的流动性和消费模式进行了准实验研究。

引用次数: 15

Resolving the Bias in Electronic Medical Records 解决电子病历中的偏见

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pub Date : 2017-08-13 DOI: 10.1145/3097983.3098149

Kaiping Zheng, Jinyang Gao, K. Ngiam, B. Ooi, J. Yip

Electronic Medical Records (EMR) are the most fundamental resources used in healthcare data analytics. Since people visit hospital more frequently when they feel sick and doctors prescribe lab examinations when they feel necessary, we argue that there could be a strong bias in EMR observations compared with the hidden conditions of patients. Directly using such EMR for analytical tasks without considering the bias may lead to misinterpretation. To this end, we propose a general method to resolve the bias by transforming EMR to regular patient hidden condition series using a Hidden Markov Model (HMM) variant. Compared with the biased EMR series with irregular time stamps, the unbiased regular time series is much easier to be processed by most analytical models and yields better results. Extensive experimental results demonstrate that our bias resolving method imputes missing data more accurately than baselines and improves the performance of the state-of-the-art methods on typical medical data analytics.

电子医疗记录(EMR)是医疗数据分析中使用的最基本的资源。由于人们在生病时更频繁地去医院，医生在必要时开实验室检查的处方，我们认为，与患者的隐藏情况相比，EMR观察结果可能存在强烈的偏差。直接使用这种EMR进行分析任务而不考虑偏差可能会导致误解。为此，我们提出了一种通用的方法，通过使用隐马尔可夫模型(HMM)变体将EMR转换为常规患者隐藏病情序列来解决偏差。与带有不规则时间戳的有偏EMR序列相比，无偏规则时间序列更容易被大多数分析模型处理，结果也更好。广泛的实验结果表明，我们的偏差解决方法比基线更准确地估算缺失数据，并提高了典型医疗数据分析中最先进方法的性能。

引用次数: 45

Not All Passes Are Created Equal: Objectively Measuring the Risk and Reward of Passes in Soccer from Tracking Data 并非所有的传球都是平等的:从跟踪数据客观地衡量足球中传球的风险和回报

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pub Date : 2017-08-13 DOI: 10.1145/3097983.3098051

P. Power, Héctor Ruiz, Xinyu Wei, P. Lucey

In soccer, the most frequent event that occurs is a pass. For a trained eye, there are a myriad of adjectives which could describe this event (e.g., "majestic pass", "conservative" to "poor-ball"). However, as these events are needed to be coded live and in real-time (most often by human annotators), the current method of grading passes is restricted to the binary labels 0 (unsuccessful) or 1 (successful). Obviously, this is sub-optimal because the quality of a pass needs to be measured on a continuous spectrum (i.e., 0 to 100%) and not a binary value. Additionally, a pass can be measured across multiple dimensions, namely: i) risk -- the likelihood of executing a pass in a given situation, and ii) reward -- the likelihood of a pass creating a chance. In this paper, we show how we estimate both the risk and reward of a pass across two seasons of tracking data captured from a recent professional soccer league with state-of-the-art performance, then showcase various use cases of our deployed passing system.

在足球比赛中，最常见的是传球。对于训练有素的人来说，有无数的形容词可以描述这个事件(例如，“雄伟的传球”，“保守的”到“可怜的球”)。然而，由于这些事件需要实时编码(通常由人工注释器编写)，因此当前的分级方法仅限于二进制标签0(不成功)或1(成功)。显然，这是次优的，因为通过的质量需要在连续谱(即0到100%)上测量，而不是二进制值。此外，通过可以通过多个维度进行衡量，即:i)风险—-在给定情况下执行通过的可能性，以及ii)奖励—-通过创造机会的可能性。在本文中，我们展示了如何通过跟踪从最近的具有最先进性能的职业足球联赛中捕获的两个赛季的数据来评估传球的风险和回报，然后展示了我们部署的传球系统的各种用例。

引用次数: 82

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀