上下文时代的秩序数据拟合

2018 IEEE Life Sciences Conference (LSC) Pub Date : 2018-10-01 DOI:10.1109/LSC.2018.8572090

K. Dick, J. Green

{"title":"上下文时代的秩序数据拟合","authors":"K. Dick, J. Green","doi":"10.1109/LSC.2018.8572090","DOIUrl":null,"url":null,"abstract":"Rank order data are pervasive in science and in our daily lived experience. With the advent of high performance computing and the commensurate increase in available data, the opportunity to capture the overall distribution of values by means of nonparametric curve fitting enables the identification of exceptional points in large datasets. With a rank order structure, these distributions may exhibit a “knee” delineating a threshold between exceptional points and those of the baseline. Given an accurate characterization of the distribution of prediction scores, including careful identification of the knee, we have previously shown that predictive performance can be significantly improved by leveraging this “context”. This paper examines the nonparametric characterization of such distributions. Locally weighted regression (LOESS) is a widely used nonparametric approach to curve fitting. Here, we revisit the assumptions behind the selection of kernel functions for nonparametric curve fitting of biological and biomedical data exhibiting rare or exceptional instances. We propose a new linear asymmetric kernel function and compare it to the commonly used tricube kernel used in LOESS. We evaluate its ability to fit rank order data in the domain of protein-protein interaction prediction. The proposed linear kernel significantly improved predictive performance $(p < 0.001$) of two state-of-the-art predictors and promises to be widely applicable in related machine learning pipelines and nonparametric regression tasks.","PeriodicalId":254835,"journal":{"name":"2018 IEEE Life Sciences Conference (LSC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Fitting Rank Order Data in the Age of Context\",\"authors\":\"K. Dick, J. Green\",\"doi\":\"10.1109/LSC.2018.8572090\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Rank order data are pervasive in science and in our daily lived experience. With the advent of high performance computing and the commensurate increase in available data, the opportunity to capture the overall distribution of values by means of nonparametric curve fitting enables the identification of exceptional points in large datasets. With a rank order structure, these distributions may exhibit a “knee” delineating a threshold between exceptional points and those of the baseline. Given an accurate characterization of the distribution of prediction scores, including careful identification of the knee, we have previously shown that predictive performance can be significantly improved by leveraging this “context”. This paper examines the nonparametric characterization of such distributions. Locally weighted regression (LOESS) is a widely used nonparametric approach to curve fitting. Here, we revisit the assumptions behind the selection of kernel functions for nonparametric curve fitting of biological and biomedical data exhibiting rare or exceptional instances. We propose a new linear asymmetric kernel function and compare it to the commonly used tricube kernel used in LOESS. We evaluate its ability to fit rank order data in the domain of protein-protein interaction prediction. The proposed linear kernel significantly improved predictive performance $(p < 0.001$) of two state-of-the-art predictors and promises to be widely applicable in related machine learning pipelines and nonparametric regression tasks.\",\"PeriodicalId\":254835,\"journal\":{\"name\":\"2018 IEEE Life Sciences Conference (LSC)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE Life Sciences Conference (LSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/LSC.2018.8572090\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Life Sciences Conference (LSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/LSC.2018.8572090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

秩序数据在科学和我们的日常生活中无处不在。随着高性能计算的出现和可用数据的相应增加，通过非参数曲线拟合来捕获值的总体分布的机会使识别大型数据集中的异常点成为可能。在秩序结构中，这些分布可能表现出一个“膝盖”，在异常点和基线点之间描绘一个阈值。鉴于预测分数分布的准确特征，包括对膝关节的仔细识别，我们之前已经表明，利用这种“背景”可以显著提高预测性能。本文研究了这类分布的非参数表征。局部加权回归是一种应用广泛的非参数曲线拟合方法。在这里，我们回顾了为生物和生物医学数据的非参数曲线拟合选择核函数背后的假设，这些核函数表现出罕见或例外的情况。提出了一种新的线性非对称核函数，并与黄土中常用的三立方核函数进行了比较。我们评估了它在蛋白质-蛋白质相互作用预测领域中拟合秩序数据的能力。提出的线性核显著提高了两个最先进的预测器的预测性能(p < 0.001)，并有望广泛应用于相关的机器学习管道和非参数回归任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fitting Rank Order Data in the Age of Context

Rank order data are pervasive in science and in our daily lived experience. With the advent of high performance computing and the commensurate increase in available data, the opportunity to capture the overall distribution of values by means of nonparametric curve fitting enables the identification of exceptional points in large datasets. With a rank order structure, these distributions may exhibit a “knee” delineating a threshold between exceptional points and those of the baseline. Given an accurate characterization of the distribution of prediction scores, including careful identification of the knee, we have previously shown that predictive performance can be significantly improved by leveraging this “context”. This paper examines the nonparametric characterization of such distributions. Locally weighted regression (LOESS) is a widely used nonparametric approach to curve fitting. Here, we revisit the assumptions behind the selection of kernel functions for nonparametric curve fitting of biological and biomedical data exhibiting rare or exceptional instances. We propose a new linear asymmetric kernel function and compare it to the commonly used tricube kernel used in LOESS. We evaluate its ability to fit rank order data in the domain of protein-protein interaction prediction. The proposed linear kernel significantly improved predictive performance $(p < 0.001$) of two state-of-the-art predictors and promises to be widely applicable in related machine learning pipelines and nonparametric regression tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助