Advances in Data Science and Adaptive Analysis最新文献

英文中文

Data-Mining Homogeneous Subgroups in Multiple Regression When Heteroscedasticity, Multicollinearity, and Missing Variables Confound Predictor Effects 当异方差、多重共线性和缺失变量混淆预测效应时，多元回归中同质子群的数据挖掘

IF 0.6 Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Advances in Data Science and Adaptive Analysis

Pub Date : 2020-09-05 DOI: 10.1142/s2424922x20410041

R. Francoeur

Multiple regression is not reliable to recover predictor slopes within homogeneous subgroups from heterogeneous samples. In contrast to Monte Carlo analysis, which assigns completely to the first-specified predictor the variation it shares with the remaining predictors, multiple regression does not assign this shared variation to any predictor, and it is sequestered in the residual term. This unassigned and confounding variation may correlate with specified predictors, lead to heteroscedasticity, and distort multicollinearity. I develop and test an iterative, sequential algorithm to estimate a two-part series of weighted least-square (WLS) multiple regressions for recovering the Monte Carlo predictor slopes in three homogeneous subgroups (each generated with 500 observations) of a heterogeneous sample [Formula: see text]. Each variable has a different nonnormal distribution. The algorithm mines each subgroup and then adjusts bias within it from 1) heteroscedasticity related to one, some, or all specified predictors and 2) “nonessential” multicollinearity. It recovers all three specified predictor slopes across the three subgroups in two scenarios, with one influenced also by two unspecified predictors. The algorithm extends adaptive analysis to discover and appraise patterns in field research and machine learning when predictors are inter-correlated, and even unspecified, in order to reveal unbiased outcome clusters in heterogeneous and homogeneous samples with nonnormal outcome and predictors.

多元回归在异质性样本的同质亚群中恢复预测斜率是不可靠的。与蒙特卡罗分析相反，蒙特卡罗分析将其与其他预测因子共享的变化完全分配给第一个指定的预测因子，多元回归不将这种共享的变化分配给任何预测因子，并且它被隔离在残差项中。这种未分配和混杂的变异可能与特定的预测因子相关，导致异方差，并扭曲多重共线性。我开发并测试了一种迭代的顺序算法，用于估计两个部分的加权最小二乘(WLS)多元回归序列，以恢复异质性样本的三个同质子组(每个子组由500个观测值生成)中的蒙特卡罗预测斜率[公式:见文本]。每个变量都有不同的非正态分布。该算法挖掘每个子组，然后调整其中的偏差:1)与一个、一些或所有指定预测因子相关的异方差;2)“非必要的”多重共线性。它在两个场景中恢复了三个子组中所有三个指定的预测斜率，其中一个也受到两个未指定预测因子的影响。该算法扩展了自适应分析，在预测因子相互关联甚至未指定的情况下，发现和评估现场研究和机器学习中的模式，以便在具有非正常结果和预测因子的异质和同质样本中揭示无偏结果簇。

{"title":"Data-Mining Homogeneous Subgroups in Multiple Regression When Heteroscedasticity, Multicollinearity, and Missing Variables Confound Predictor Effects","authors":"R. Francoeur","doi":"10.1142/s2424922x20410041","DOIUrl":"https://doi.org/10.1142/s2424922x20410041","url":null,"abstract":"Multiple regression is not reliable to recover predictor slopes within homogeneous subgroups from heterogeneous samples. In contrast to Monte Carlo analysis, which assigns completely to the first-specified predictor the variation it shares with the remaining predictors, multiple regression does not assign this shared variation to any predictor, and it is sequestered in the residual term. This unassigned and confounding variation may correlate with specified predictors, lead to heteroscedasticity, and distort multicollinearity. I develop and test an iterative, sequential algorithm to estimate a two-part series of weighted least-square (WLS) multiple regressions for recovering the Monte Carlo predictor slopes in three homogeneous subgroups (each generated with 500 observations) of a heterogeneous sample [Formula: see text]. Each variable has a different nonnormal distribution. The algorithm mines each subgroup and then adjusts bias within it from 1) heteroscedasticity related to one, some, or all specified predictors and 2) “nonessential” multicollinearity. It recovers all three specified predictor slopes across the three subgroups in two scenarios, with one influenced also by two unspecified predictors. The algorithm extends adaptive analysis to discover and appraise patterns in field research and machine learning when predictors are inter-correlated, and even unspecified, in order to reveal unbiased outcome clusters in heterogeneous and homogeneous samples with nonnormal outcome and predictors.","PeriodicalId":47145,"journal":{"name":"Advances in Data Science and Adaptive Analysis","volume":"24 1","pages":"2041004:1-2041004:59"},"PeriodicalIF":0.6,"publicationDate":"2020-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73909655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Design and Implementation of a Novel Hybrid Rental Apartment Recommender System 一种新型混合出租公寓推荐系统的设计与实现

IF 0.6 Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Advances in Data Science and Adaptive Analysis

Pub Date : 2020-08-13 DOI: 10.1142/s2424922x2041003x

A. A. Neloy, S. Alam, R. A. Bindu

Recommender Systems (RSs) have become an essential part of most e-commerce sites nowadays. Though there are several studies conducted on RSs, a hybrid recommender system for the real state search engine to find appropriate rental apartment taking users preferences into account is still due. To address this problem, a hybrid recommender system is proposed in this paper constructed by two of the most popular recommendation approaches — Collaborative Filtering (CF), Content-Based Recommender (CBR). CF-based methods use the ratings given to items by users as the sole source of information for learning to make a recommendation. However, these ratings are often very sparse in applications like a search engine, causing CF-based methods to degrade accuracy and performance. To reduce this sparsity problem in the CF method, the Cosine Similarity Score (CSS) between the user and predicted apartment, based on their Feature Vectors (FV) from the CBR module is utilized. Improved and optimized Singular Value Decomposition (SVD) with Bias-Matrix Factorization (MF) of the CF model and CSS with FV of CBR constructs this hybrid recommender. The proposed recommender was evaluated using the Statistical Cross-Validation consisting of Leave-One-Out Validation (LOOCV). Experimental results show that it significantly outperformed a benchmark random recommender in terms of precision and recall. In addition, a graphical analysis of the relationships between the accuracy and error minimization is presented to provide further evidence for the potentiality of this hybrid recommender system in this area.

推荐系统(RSs)已成为当今大多数电子商务网站的重要组成部分。虽然已经有一些关于RSs的研究，但仍然需要一个混合推荐系统，让房地产搜索引擎根据用户的喜好找到合适的出租公寓。为了解决这一问题，本文提出了一种混合推荐系统，该系统由两种最流行的推荐方法-协同过滤(CF)和基于内容的推荐(CBR)构建。基于cf的方法使用用户对项目的评分作为学习做出推荐的唯一信息来源。然而，在搜索引擎等应用程序中，这些评级通常非常稀疏，导致基于cf的方法降低了准确性和性能。为了减少CF方法中的这种稀疏性问题，使用了基于CBR模块中的特征向量(FV)的用户和预测公寓之间的余弦相似度评分(CSS)。CF模型的奇异值分解(SVD)与偏置矩阵分解(MF)相结合，CBR模型的CSS与FV相结合，构建了混合推荐系统。建议的推荐人使用由留一验证(LOOCV)组成的统计交叉验证进行评估。实验结果表明，该方法在准确率和召回率方面明显优于基准随机推荐。此外，对准确率和误差最小化之间的关系进行了图形化分析，为该混合推荐系统在该领域的潜力提供了进一步的证据。

{"title":"Design and Implementation of a Novel Hybrid Rental Apartment Recommender System","authors":"A. A. Neloy, S. Alam, R. A. Bindu","doi":"10.1142/s2424922x2041003x","DOIUrl":"https://doi.org/10.1142/s2424922x2041003x","url":null,"abstract":"Recommender Systems (RSs) have become an essential part of most e-commerce sites nowadays. Though there are several studies conducted on RSs, a hybrid recommender system for the real state search engine to find appropriate rental apartment taking users preferences into account is still due. To address this problem, a hybrid recommender system is proposed in this paper constructed by two of the most popular recommendation approaches — Collaborative Filtering (CF), Content-Based Recommender (CBR). CF-based methods use the ratings given to items by users as the sole source of information for learning to make a recommendation. However, these ratings are often very sparse in applications like a search engine, causing CF-based methods to degrade accuracy and performance. To reduce this sparsity problem in the CF method, the Cosine Similarity Score (CSS) between the user and predicted apartment, based on their Feature Vectors (FV) from the CBR module is utilized. Improved and optimized Singular Value Decomposition (SVD) with Bias-Matrix Factorization (MF) of the CF model and CSS with FV of CBR constructs this hybrid recommender. The proposed recommender was evaluated using the Statistical Cross-Validation consisting of Leave-One-Out Validation (LOOCV). Experimental results show that it significantly outperformed a benchmark random recommender in terms of precision and recall. In addition, a graphical analysis of the relationships between the accuracy and error minimization is presented to provide further evidence for the potentiality of this hybrid recommender system in this area.","PeriodicalId":47145,"journal":{"name":"Advances in Data Science and Adaptive Analysis","volume":"543 1","pages":"2041003:1-2041003:17"},"PeriodicalIF":0.6,"publicationDate":"2020-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77695242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimizing Data Transmission from IoT Devices Through Weighted Online Data Changing Detectors 通过加权在线数据变化检测器优化物联网设备的数据传输

IF 0.6 Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Advances in Data Science and Adaptive Analysis

Pub Date : 2020-08-13 DOI: 10.1142/s2424922x20410016

M. Diván, M. Reynoso

The real-time data analysis requires an integrated approach to know the last known state of variables of a concept under monitoring. Thereby, the Internet-of-Thing (IoT) devices have provided alternatives to address distributed data collection strategies. However, the autonomy of IoT devices represents one of the main challenges to implement the collecting strategy. Battery autonomy is affected directly by the energy consumption derived from data transmissions. The Data Stream Processing Strategy (DSPS) is an architecture oriented to the implementation of measurement projects based on a measurement and evaluation framework. Its online processing is guided by the measurement metadata informed from IoT devices associated with a component named Measurement Adapter (MA). This paper presents a new data buffer organization based on measurement metadata articulated with online data filtering to optimize the data transmissions from MA. As contributions, a weighted data change detection approach is incorporated, while a new local buffer based on logical windows is proposed for MA. Also, an articulation among the data buffer, a temporal barrier, and data change detectors is introduced. The proposal was implemented and released on the pabmmCommons library. A discrete simulation on the library is here described to provide initial applicability patterns. The data buffer consumed 568 Kb for monitoring 100 simultaneous metrics. The online estimation of the mean and variance based on the Statistical Process Control consumed 238 ns. However, as a limitation, other scenarios need to be addressed before generalizing results. As future work, new alternatives to filter noise online will be addressed.

实时数据分析需要一种综合的方法来了解被监测概念变量的最后已知状态。因此，物联网(IoT)设备提供了解决分布式数据收集策略的替代方案。然而，物联网设备的自主性是实施收集策略的主要挑战之一。电池的自主性直接受到来自数据传输的能量消耗的影响。数据流处理策略(DSPS)是一种面向基于度量和评估框架的度量项目实现的体系结构。其在线处理由与测量适配器(MA)组件相关的物联网设备通知的测量元数据指导。本文提出了一种基于测量元数据的数据缓冲结构，并结合在线数据过滤优化了测量数据的传输。作为贡献，本文引入了一种加权数据变化检测方法，并提出了一种新的基于逻辑窗口的局部缓冲区。此外，还介绍了数据缓冲区、时间屏障和数据更改检测器之间的连接。该提案在pabmmCommons库上实现并发布。这里描述了对库的离散模拟，以提供初始适用性模式。用于监视100个同时度量的数据缓冲区消耗了568 Kb。基于统计过程控制的均值和方差的在线估计耗时238 ns。然而，作为限制，在推广结果之前需要解决其他情况。作为未来的工作，在线过滤噪声的新选择将被解决。

{"title":"Optimizing Data Transmission from IoT Devices Through Weighted Online Data Changing Detectors","authors":"M. Diván, M. Reynoso","doi":"10.1142/s2424922x20410016","DOIUrl":"https://doi.org/10.1142/s2424922x20410016","url":null,"abstract":"The real-time data analysis requires an integrated approach to know the last known state of variables of a concept under monitoring. Thereby, the Internet-of-Thing (IoT) devices have provided alternatives to address distributed data collection strategies. However, the autonomy of IoT devices represents one of the main challenges to implement the collecting strategy. Battery autonomy is affected directly by the energy consumption derived from data transmissions. The Data Stream Processing Strategy (DSPS) is an architecture oriented to the implementation of measurement projects based on a measurement and evaluation framework. Its online processing is guided by the measurement metadata informed from IoT devices associated with a component named Measurement Adapter (MA). This paper presents a new data buffer organization based on measurement metadata articulated with online data filtering to optimize the data transmissions from MA. As contributions, a weighted data change detection approach is incorporated, while a new local buffer based on logical windows is proposed for MA. Also, an articulation among the data buffer, a temporal barrier, and data change detectors is introduced. The proposal was implemented and released on the pabmmCommons library. A discrete simulation on the library is here described to provide initial applicability patterns. The data buffer consumed 568 Kb for monitoring 100 simultaneous metrics. The online estimation of the mean and variance based on the Statistical Process Control consumed 238 ns. However, as a limitation, other scenarios need to be addressed before generalizing results. As future work, new alternatives to filter noise online will be addressed.","PeriodicalId":47145,"journal":{"name":"Advances in Data Science and Adaptive Analysis","volume":"51 1","pages":"2041001:1-2041001:33"},"PeriodicalIF":0.6,"publicationDate":"2020-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74604332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IDC Breast Cancer Detection Using Deep Learning Schemes 使用深度学习方案的IDC乳腺癌检测

IF 0.6 Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Advances in Data Science and Adaptive Analysis

Pub Date : 2020-08-13 DOI: 10.1142/s2424922x20410028

K. Kumar, Umair Saeed, Athaul Rai, Noman Islam, G. Shaikh, A. Qayoom

During the past few years, deep learning (DL) architectures are being employed in many potential areas such as object detection, face recognition, natural language processing, medical image analysis and other related applications. In these applications, DL has achieved remarkable results matching the performance of human experts. This paper presents a novel convolutional neural networks (CNN)-based approach for the detection of breast cancer in invasive ductal carcinoma tissue regions using whole slide images (WSI). It has been observed that breast cancer has been a leading cause of death among women. It also remains a striving task for pathologist to find the malignancy regions from WSI. In this research, we have implemented different CNN models which include VGG16, VGG19, Xception, Inception V3, MobileNetV2, ResNet50, and DenseNet. The experiments were performed on standard WSI slides data-set which include 163 patients of IDC. For performance evaluation, same data-set was divided into 113 and 49 images for training and testing, respectively. The testing was carried out separately over each model and the obtained results showed that our proposed CNN model achieved 83% accuracy which is better than the other models.

在过去的几年里，深度学习(DL)架构被应用于许多潜在的领域，如物体检测、人脸识别、自然语言处理、医学图像分析和其他相关应用。在这些应用中，深度学习取得了与人类专家相当的成绩。本文提出了一种基于卷积神经网络(CNN)的方法，利用全幻灯片图像(WSI)检测浸润性导管癌组织区域的乳腺癌。据观察，乳腺癌是妇女死亡的主要原因。从WSI中发现恶性肿瘤区域也是病理学家努力的任务。在本研究中，我们实现了不同的CNN模型，包括VGG16、VGG19、Xception、Inception V3、MobileNetV2、ResNet50和DenseNet。实验在163例IDC患者的标准WSI玻片数据集上进行。为了进行性能评估，将同一数据集分别分成113张和49张图像进行训练和测试。对每个模型分别进行了测试，结果表明我们提出的CNN模型准确率达到83%，优于其他模型。

{"title":"IDC Breast Cancer Detection Using Deep Learning Schemes","authors":"K. Kumar, Umair Saeed, Athaul Rai, Noman Islam, G. Shaikh, A. Qayoom","doi":"10.1142/s2424922x20410028","DOIUrl":"https://doi.org/10.1142/s2424922x20410028","url":null,"abstract":"During the past few years, deep learning (DL) architectures are being employed in many potential areas such as object detection, face recognition, natural language processing, medical image analysis and other related applications. In these applications, DL has achieved remarkable results matching the performance of human experts. This paper presents a novel convolutional neural networks (CNN)-based approach for the detection of breast cancer in invasive ductal carcinoma tissue regions using whole slide images (WSI). It has been observed that breast cancer has been a leading cause of death among women. It also remains a striving task for pathologist to find the malignancy regions from WSI. In this research, we have implemented different CNN models which include VGG16, VGG19, Xception, Inception V3, MobileNetV2, ResNet50, and DenseNet. The experiments were performed on standard WSI slides data-set which include 163 patients of IDC. For performance evaluation, same data-set was divided into 113 and 49 images for training and testing, respectively. The testing was carried out separately over each model and the obtained results showed that our proposed CNN model achieved 83% accuracy which is better than the other models.","PeriodicalId":47145,"journal":{"name":"Advances in Data Science and Adaptive Analysis","volume":"8 1","pages":"2041002:1-2041002:19"},"PeriodicalIF":0.6,"publicationDate":"2020-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85193501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Index 指数

IF 0.6 Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Advances in Data Science and Adaptive Analysis

Pub Date : 2020-01-15 DOI: 10.1002/9781119695110.index

引用次数: 0

Other titles from iSTE in Innovation, Entrepreneurship and Management iSTE在创新、创业和管理方面的其他头衔

IF 0.6 Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Advances in Data Science and Adaptive Analysis

Pub Date : 2020-01-15 DOI: 10.1002/9781119695110.oth

引用次数: 0

Bayesian Kernel Regression for Noisy Inputs Based on Nadaraya-Watson Estimator Constructed from Noiseless Training Data 基于Nadaraya-Watson估计的噪声输入贝叶斯核回归

IF 0.6 Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Advances in Data Science and Adaptive Analysis

Pub Date : 2020-01-01 DOI: 10.1142/s2424922x20500047

Ryo Hanafusa, T. Okadome

In regression for noisy inputs, noise is typically removed from a given noisy input if possible, and then the resulting noise-free input is provided to the regression function. In some cases, however, there is no available time or method for removing noise. The regression method proposed in this paper determines a regression function for noisy inputs using the estimated posterior of their noise-free constituents with a nonparametric estimator for noiseless explanatory values, which is constructed from noiseless training data. In addition, a probabilistic generative model is presented for estimating the noise distribution. This enables us to determine the noise distribution parametrically from a single noisy input, using the distribution of the noise-free constituent of noisy input estimated from the training data set as a prior. Experiments conducted using artificial and real data sets show that the proposed method suppresses the overfitting of the regression function for noisy inputs and the root mean squared errors (RMSEs) of the predictions are smaller compared with those of an existing method.

在有噪声输入的回归中，如果可能，通常从给定的有噪声输入中去除噪声，然后将得到的无噪声输入提供给回归函数。然而，在某些情况下，没有可用的时间或方法来消除噪声。本文提出的回归方法利用无噪声成分的估计后验和由无噪声训练数据构造的无噪声解释值的非参数估计来确定有噪声输入的回归函数。此外，提出了一种估计噪声分布的概率生成模型。这使我们能够从单个噪声输入参数化地确定噪声分布，使用从训练数据集估计的噪声输入的无噪声成分的分布作为先验。利用人工数据集和真实数据集进行的实验表明，该方法抑制了回归函数对噪声输入的过拟合，预测结果的均方根误差(rmse)小于现有方法。

{"title":"Bayesian Kernel Regression for Noisy Inputs Based on Nadaraya-Watson Estimator Constructed from Noiseless Training Data","authors":"Ryo Hanafusa, T. Okadome","doi":"10.1142/s2424922x20500047","DOIUrl":"https://doi.org/10.1142/s2424922x20500047","url":null,"abstract":"In regression for noisy inputs, noise is typically removed from a given noisy input if possible, and then the resulting noise-free input is provided to the regression function. In some cases, however, there is no available time or method for removing noise. The regression method proposed in this paper determines a regression function for noisy inputs using the estimated posterior of their noise-free constituents with a nonparametric estimator for noiseless explanatory values, which is constructed from noiseless training data. In addition, a probabilistic generative model is presented for estimating the noise distribution. This enables us to determine the noise distribution parametrically from a single noisy input, using the distribution of the noise-free constituent of noisy input estimated from the training data set as a prior. Experiments conducted using artificial and real data sets show that the proposed method suppresses the overfitting of the regression function for noisy inputs and the root mean squared errors (RMSEs) of the predictions are smaller compared with those of an existing method.","PeriodicalId":47145,"journal":{"name":"Advances in Data Science and Adaptive Analysis","volume":"137 1","pages":"2050004:1-2050004:17"},"PeriodicalIF":0.6,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74521454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Hybrid Machine Learning and Geographic Information Systems Approach - A Case for Grade Crossing Crash Data Analysis 混合机器学习和地理信息系统方法——一个平交道口碰撞数据分析的案例

IF 0.6 Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Advances in Data Science and Adaptive Analysis

Pub Date : 2020-01-01 DOI: 10.1142/s2424922x20500035

A. Lasisi, Pengyu Li, Jian Chen

Highway-rail grade crossing (HRGC) accidents continue to be a major source of transportation casualties in the United States. This can be attributed to increased road and rail operations and/or lack of adequate safety programs based on comprehensive HRGC accidents analysis amidst other reasons. The focus of this study is to predict HRGC accidents in a given rail network based on a machine learning analysis of a similar network with cognate attributes. This study is an improvement on past studies that either attempt to predict accidents in a given HRGC or spatially analyze HRGC accidents for a particular rail line. In this study, a case for a hybrid machine learning and geographic information systems (GIS) approach is presented in a large rail network. The study involves collection and wrangling of relevant data from various sources; exploratory analysis, and supervised machine learning (classification and regression) of HRGC data from 2008 to 2017 in California. The models developed from this analysis were used to make binary predictions [98.9% accuracy & 0.9838 Receiver Operating Characteristic (ROC) score] and quantitative estimations of HRGC casualties in a similar network over the next 10 years. While results are spatially presented in GIS, this novel hybrid application of machine learning and GIS in HRGC accidents’ analysis will help stakeholders to pro-actively engage with casualties through addressing major accident causes as identified in this study. This paper is concluded with a Systems-Action-Management (SAM) approach based on text analysis of HRGC accident risk reports from Federal Railroad Administration.

在美国，公路-铁路平交道口(HRGC)事故仍然是交通伤亡的主要来源。这可归因于公路和铁路运营的增加和/或缺乏基于全面HRGC事故分析的适当安全计划以及其他原因。本研究的重点是基于对具有同源属性的类似网络的机器学习分析，预测给定铁路网络中的HRGC事故。这项研究是对过去的研究的改进，这些研究要么试图预测给定HRGC中的事故，要么对特定铁路线的HRGC事故进行空间分析。在这项研究中，提出了一个大型铁路网络中混合机器学习和地理信息系统(GIS)方法的案例。这项研究包括从各种来源收集和整理相关数据;对2008年至2017年加州HRGC数据进行探索性分析和监督机器学习(分类和回归)。根据该分析建立的模型用于二元预测[98.9%准确率和0.9838受试者工作特征(ROC)评分]和定量估计未来10年类似网络中的HRGC伤亡。虽然结果在GIS中以空间形式呈现，但这种机器学习和GIS在HRGC事故分析中的新型混合应用将帮助利益相关者通过解决本研究中确定的主要事故原因，积极参与伤亡。本文以美国联邦铁路局HRGC事故风险报告文本分析为基础，采用系统-行动管理(SAM)方法进行总结。

{"title":"Hybrid Machine Learning and Geographic Information Systems Approach - A Case for Grade Crossing Crash Data Analysis","authors":"A. Lasisi, Pengyu Li, Jian Chen","doi":"10.1142/s2424922x20500035","DOIUrl":"https://doi.org/10.1142/s2424922x20500035","url":null,"abstract":"Highway-rail grade crossing (HRGC) accidents continue to be a major source of transportation casualties in the United States. This can be attributed to increased road and rail operations and/or lack of adequate safety programs based on comprehensive HRGC accidents analysis amidst other reasons. The focus of this study is to predict HRGC accidents in a given rail network based on a machine learning analysis of a similar network with cognate attributes. This study is an improvement on past studies that either attempt to predict accidents in a given HRGC or spatially analyze HRGC accidents for a particular rail line. In this study, a case for a hybrid machine learning and geographic information systems (GIS) approach is presented in a large rail network. The study involves collection and wrangling of relevant data from various sources; exploratory analysis, and supervised machine learning (classification and regression) of HRGC data from 2008 to 2017 in California. The models developed from this analysis were used to make binary predictions [98.9% accuracy & 0.9838 Receiver Operating Characteristic (ROC) score] and quantitative estimations of HRGC casualties in a similar network over the next 10 years. While results are spatially presented in GIS, this novel hybrid application of machine learning and GIS in HRGC accidents’ analysis will help stakeholders to pro-actively engage with casualties through addressing major accident causes as identified in this study. This paper is concluded with a Systems-Action-Management (SAM) approach based on text analysis of HRGC accident risk reports from Federal Railroad Administration.","PeriodicalId":47145,"journal":{"name":"Advances in Data Science and Adaptive Analysis","volume":"7 1 1","pages":"2050003:1-2050003:30"},"PeriodicalIF":0.6,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79423402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Weak and Strong Compatibility in Data Fitting Problems Under Interval Uncertainty 区间不确定性下数据拟合问题的弱与强兼容性

IF 0.6 Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Advances in Data Science and Adaptive Analysis

Pub Date : 2020-01-01 DOI: 10.1142/s2424922x20500023

S. P. Shary

For the data fitting problem under interval uncertainty, we introduce the concept of strong compatibility between data and parameters. It is shown that the new strengthened formulation of the problem reduces to computing and estimating the so-called tolerable solution set for interval systems of equations constructed from the data being processed. We propose a computational technology for constructing a “best-fit” linear function from interval data, taking into account the strong compatibility requirement. The properties of the new data fitting approach are much better than those of its predecessors: strong compatibility estimates have polynomial computational complexity, the variance of the strong compatibility estimates is almost always finite, and these estimates are rubust. An example considered in the concluding part of the paper illustrates some of these features.

对于区间不确定条件下的数据拟合问题，引入了数据与参数强相容的概念。结果表明，该问题的新强化形式简化为计算和估计由所处理的数据构成的区间方程组的所谓可容忍解集。我们提出了一种从区间数据构造“最佳拟合”线性函数的计算技术，考虑到强兼容性要求。新的数据拟合方法比以前的方法具有更好的特性:强兼容性估计具有多项式的计算复杂度，强兼容性估计的方差几乎总是有限的，并且这些估计具有鲁棒性。本文最后的一个例子说明了其中的一些特征。

引用次数: 5

Condition Monitoring of Equipment in Oil Wells using Deep Learning 基于深度学习的油井设备状态监测

IF 0.6 Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Advances in Data Science and Adaptive Analysis

Pub Date : 2020-01-01 DOI: 10.1142/s2424922x20500011

Y. Imamverdiyev, F. Abdullayeva

In this paper, a fault prediction method for oil well equipment based on the analysis of time series data obtained from multiple sensors is proposed. The proposed method is based on deep learning (DL). For this purpose, comparative analysis of single-layer long short-term memory (LSTM) with the convolutional neural network (CNN) and stacked LSTM methods is provided. To demonstrate the efficacy of the proposed method, some experiments are conducted on the real data set obtained from eight sensors installed in oil wells. In this paper, compared to the single-layer LSTM model, the CNN and stacked LSTM predicted the faulty time series with a minimal loss.

提出了一种基于多传感器时间序列数据分析的油井设备故障预测方法。该方法基于深度学习(DL)。为此，对单层长短期记忆(LSTM)与卷积神经网络(CNN)和堆叠LSTM方法进行了对比分析。为了验证该方法的有效性，对安装在油井中的8个传感器的真实数据集进行了实验。在本文中，与单层LSTM模型相比，CNN和堆叠LSTM以最小的损失预测故障时间序列。

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Advances in Data Science and Adaptive Analysis

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀