数据分析和信息处理(英文)最新文献_第7页

Bayesian Non-Parametric Mixture Model with Application to Modeling Biological Markers 贝叶斯非参数混合模型及其在生物标记建模中的应用

数据分析和信息处理(英文)

Pub Date : 2019-09-12 DOI: 10.4236/jdaip.2019.74009

M. K. Peter, L. Mbugua, A. Wanjoya

The effect of treatment on patient’s outcome can easily be determined through the impact of the treatment on biological events. Observing the treatment for patients for a certain period of time can help in determining whether there is any change in the biomarker of the patient. It is important to study how the biomarker changes due to treatment and whether for different individuals located in separate centers can be clustered together since they might have different distributions. The study is motivated by a Bayesian non-parametric mixture model, which is more flexible when compared to the Bayesian Parametric models and is capable of borrowing information across different centers allowing them to be grouped together. To this end, this research modeled Biological markers taking into consideration the Surrogate markers. The study employed the nested Dirichlet process prior, which is easily peaceable on different distributions for several centers, with centers from the same Dirichlet process component clustered automatically together. The study sampled from the posterior by use of Markov chain Monte carol algorithm. The model is illustrated using a simulation study to see how it performs on simulated data. Clearly, from the simulation study it was clear that, the model was capable of clustering data into different clusters.

通过治疗对生物事件的影响，可以很容易地确定治疗对患者结局的影响。观察患者的治疗一段时间可以帮助确定患者的生物标志物是否有任何变化。重要的是研究生物标志物如何因治疗而变化，以及位于不同中心的不同个体是否可以聚集在一起，因为它们可能具有不同的分布。这项研究的动机是贝叶斯非参数混合模型，与贝叶斯参数模型相比，该模型更灵活，能够跨不同中心借用信息，从而将它们分组在一起。为此，本研究对生物标记物进行了建模，并考虑了代孕标记物。该研究采用了嵌套的狄利克雷过程先验，它在几个中心的不同分布上很容易和平，来自同一狄利克雷进程组件的中心自动聚集在一起。本研究采用马尔可夫链蒙特卡罗算法从后验样本中抽取样本。通过模拟研究对该模型进行了说明，以了解其在模拟数据上的表现。显然，从模拟研究中可以清楚地看出，该模型能够将数据聚类到不同的聚类中。

{"title":"Bayesian Non-Parametric Mixture Model with Application to Modeling Biological Markers","authors":"M. K. Peter, L. Mbugua, A. Wanjoya","doi":"10.4236/jdaip.2019.74009","DOIUrl":"https://doi.org/10.4236/jdaip.2019.74009","url":null,"abstract":"The effect of treatment on patient’s outcome can easily be determined through the impact of the treatment on biological events. Observing the treatment for patients for a certain period of time can help in determining whether there is any change in the biomarker of the patient. It is important to study how the biomarker changes due to treatment and whether for different individuals located in separate centers can be clustered together since they might have different distributions. The study is motivated by a Bayesian non-parametric mixture model, which is more flexible when compared to the Bayesian Parametric models and is capable of borrowing information across different centers allowing them to be grouped together. To this end, this research modeled Biological markers taking into consideration the Surrogate markers. The study employed the nested Dirichlet process prior, which is easily peaceable on different distributions for several centers, with centers from the same Dirichlet process component clustered automatically together. The study sampled from the posterior by use of Markov chain Monte carol algorithm. The model is illustrated using a simulation study to see how it performs on simulated data. Clearly, from the simulation study it was clear that, the model was capable of clustering data into different clusters.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41262284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Kikamba Computational Grammar Kikamba计算语法研究

数据分析和信息处理(英文)

Pub Date : 2019-09-12 DOI: 10.4236/jdaip.2019.74015

Benson Kituku, Wanjiku Ng'ang'a, Lawrence Muchemi

The under-resourced Kikamba language has few language technology tools since the more efficient and popular data driven approaches for developing them suffer from data sparseness due to lack of digitized corpora. To address this challenge, we have developed a computational grammar for the Kikamba language within the multilingual Grammatical Framework (GF) toolkit. GF uses the Interlingua rule-based translation approach. To develop the grammar, we used the morphology driven strategy. Therefore, we first developed regular expressions for morphology inflection and thereafter developed the syntax rules. Evaluation of the grammar was done using one hundred sentences in both English and Kikamba languages. The results were an encouraging four n-gram BLEU score of 83.05% and the Position independent error rate (PER) of 10.96%. Finally, we have made a contribution to the language technology resources for Kikamba including multilingual machine translation, a morphology analyzer, a computational grammar which provides a platform for development of multilingual applications and the ability to generate a variety of bilingual corpora for Kikamba for all languages currently defined in GF, making it easier to experiment with data driven approaches.

由于缺乏数字化语料库，开发Kikamba语言的更有效和流行的数据驱动方法受到数据稀疏的影响，因此资源不足的Kikamba语言几乎没有语言技术工具。为了应对这一挑战，我们在多语言语法框架(GF)工具包中为Kikamba语言开发了一个计算语法。GF使用Interlingua基于规则的翻译方法。为了开发语法，我们使用了词法驱动策略。因此，我们首先开发了词形变化的正则表达式，然后开发了语法规则。使用英语和基坎巴语的100个句子对语法进行了评估。结果令人鼓舞的4 n-gram BLEU得分为83.05%，位置无关错误率(PER)为10.96%。最后，我们为Kikamba的语言技术资源做出了贡献，包括多语言机器翻译，形态学分析仪，计算语法，它为多语言应用程序的开发提供了一个平台，并能够为Kikamba生成各种双语语料库，用于目前在GF中定义的所有语言，使其更容易实验数据驱动的方法。

{"title":"Towards Kikamba Computational Grammar","authors":"Benson Kituku, Wanjiku Ng'ang'a, Lawrence Muchemi","doi":"10.4236/jdaip.2019.74015","DOIUrl":"https://doi.org/10.4236/jdaip.2019.74015","url":null,"abstract":"The under-resourced Kikamba language has few language technology tools since the more efficient and popular data driven approaches for developing them suffer from data sparseness due to lack of digitized corpora. To address this challenge, we have developed a computational grammar for the Kikamba language within the multilingual Grammatical Framework (GF) toolkit. GF uses the Interlingua rule-based translation approach. To develop the grammar, we used the morphology driven strategy. Therefore, we first developed regular expressions for morphology inflection and thereafter developed the syntax rules. Evaluation of the grammar was done using one hundred sentences in both English and Kikamba languages. The results were an encouraging four n-gram BLEU score of 83.05% and the Position independent error rate (PER) of 10.96%. Finally, we have made a contribution to the language technology resources for Kikamba including multilingual machine translation, a morphology analyzer, a computational grammar which provides a platform for development of multilingual applications and the ability to generate a variety of bilingual corpora for Kikamba for all languages currently defined in GF, making it easier to experiment with data driven approaches.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47635837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Tweet Sentiment Analysis (TSA) for Cloud Providers Using Classification Algorithms and Latent Semantic Analysis 使用分类算法和潜在语义分析的云提供商Tweet情感分析(TSA)

数据分析和信息处理(英文)

Pub Date : 2019-09-12 DOI: 10.4236/jdaip.2019.74016

Ioannis Karamitsos, Saeed Albarhami, Charalampos Apostolopoulos

The availability and advancements of cloud computing service models such as IaaS, SaaS, and PaaS; introducing on-demand self-service, auto scaling, easy maintenance, and pay as you go, has dramatically transformed the way organizations design and operate their datacenters. However, some organizations still have many concerns like: security, governance, lack of expertise, and migration. The purpose of this paper is to discuss the cloud computing customers’ opinions, feedbacks, attitudes, and emotions towards cloud computing services using sentiment analysis. The associated aim, is to help people and organizations to understand the benefits and challenges of cloud services from the general public’s perspective view as well as opinions about existing cloud providers, focusing on three main cloud providers: Azure, Amazon Web Services (AWS) and Google Cloud. The methodology used in this paper is based on sentiment analysis applied to the tweets that were extracted from social media platform (Twitter) via its search API. We have extracted a sample of 11,000 tweets and each cloud provider has almost similar proportion of the tweets based on relevant hashtags and keywords. Analysis starts by combining the tweets in order to find the overall polarity about cloud computing, then breaking the tweets to find the specific polarity for each cloud provider. Bing and NRC Lexicons are employed to measure the polarity and emotion of the terms in the tweets. The overall polarity classification of the tweets across all cloud providers shows 68.5% positive and 31.5% negative percentages. More specifically, Azure shows 63.8% positive and 36.2% negative tweets, Google Cloud shows 72.6% positive and 27.4% negative tweets and AWS shows 69.1% positive and 30.9% negative tweets.

IaaS、SaaS和PaaS等云计算服务模型的可用性和进步；引入按需自助服务、自动扩展、易于维护和现收现付，极大地改变了组织设计和运营数据中心的方式。然而，一些组织仍然有许多问题，如：安全、治理、缺乏专业知识和迁移。本文的目的是通过情绪分析来讨论云计算客户对云计算服务的意见、反馈、态度和情绪。相关目的是帮助人们和组织从公众的角度以及对现有云提供商的看法来理解云服务的好处和挑战，重点关注三个主要的云提供商：Azure、亚马逊网络服务（AWS）和谷歌云。本文使用的方法基于情绪分析，该分析应用于通过搜索API从社交媒体平台（Twitter）提取的推文。我们提取了11000条推文的样本，根据相关标签和关键词，每个云提供商的推文比例几乎相似。分析首先结合推文，以找到云计算的总体极性，然后分解推文，找到每个云提供商的特定极性。Bing和NRC词典用于测量推文中术语的极性和情感。所有云提供商对推文的总体极性分类显示，68.5%的推文是正面的，31.5%的推文为负面的。更具体地说，Azure显示63.8%的正面和36.2%的负面推文，谷歌云显示72.6%的正面和27.4%的负面推特，AWS显示69.1%的正面和30.9%的负面推推文。

{"title":"Tweet Sentiment Analysis (TSA) for Cloud Providers Using Classification Algorithms and Latent Semantic Analysis","authors":"Ioannis Karamitsos, Saeed Albarhami, Charalampos Apostolopoulos","doi":"10.4236/jdaip.2019.74016","DOIUrl":"https://doi.org/10.4236/jdaip.2019.74016","url":null,"abstract":"The availability and advancements of cloud computing service models such as IaaS, SaaS, and PaaS; introducing on-demand self-service, auto scaling, easy maintenance, and pay as you go, has dramatically transformed the way organizations design and operate their datacenters. However, some organizations still have many concerns like: security, governance, lack of expertise, and migration. The purpose of this paper is to discuss the cloud computing customers’ opinions, feedbacks, attitudes, and emotions towards cloud computing services using sentiment analysis. The associated aim, is to help people and organizations to understand the benefits and challenges of cloud services from the general public’s perspective view as well as opinions about existing cloud providers, focusing on three main cloud providers: Azure, Amazon Web Services (AWS) and Google Cloud. The methodology used in this paper is based on sentiment analysis applied to the tweets that were extracted from social media platform (Twitter) via its search API. We have extracted a sample of 11,000 tweets and each cloud provider has almost similar proportion of the tweets based on relevant hashtags and keywords. Analysis starts by combining the tweets in order to find the overall polarity about cloud computing, then breaking the tweets to find the specific polarity for each cloud provider. Bing and NRC Lexicons are employed to measure the polarity and emotion of the terms in the tweets. The overall polarity classification of the tweets across all cloud providers shows 68.5% positive and 31.5% negative percentages. More specifically, Azure shows 63.8% positive and 36.2% negative tweets, Google Cloud shows 72.6% positive and 27.4% negative tweets and AWS shows 69.1% positive and 30.9% negative tweets.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43994825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Origin of Dynamic Correlations of Words in Written Texts 书面语篇中词语动态关联的起源

数据分析和信息处理(英文)

Pub Date : 2019-09-12 DOI: 10.4236/jdaip.2019.74014

Hiroshi Ogura, Hiromi Amano, Masato Kondo

In a previous study, we introduced dynamical aspects of written texts by regarding serial sentence number from the first to last sentence of a given text as discretized time. Using this definition of a textual timeline, we defined an autocorrelation function (ACF) for word occurrences and demonstrated its utility both for representing dynamic word correlations and for measuring word importance within the text. In this study, we seek a stochastic process governing occurrences of a given word having strong dynamic correlations. This is valuable because words exhibiting strong dynamic correlations play a central role in developing or organizing textual contexts. While seeking this stochastic process, we find that additive binary Markov chain theory is useful for describing strong dynamic word correlations, in the sense that it can reproduce characteristics of autocovariance functions (an unnormalized version of ACFs) observed in actual written texts. Using this theory, we propose a model for time-varying probability that describes the probability of word occurrence in each sentence in a text. The proposed model considers hierarchical document structures such as chapters, sections, subsections, paragraphs, and sentences. Because such a hierarchical structure is common to most documents, our model for occurrence probability of words has a wide range of universality for interpreting dynamic word correlations in actual written texts. The main contributions of this study are, therefore, finding usability of the additive binary Markov chain theory to analyze dynamic correlations in written texts and offering a new model of word occurrence probability in which common hierarchical structure of documents is taken into account.

在之前的一项研究中，我们通过将给定文本从第一句到最后一句的连续句子编号视为离散时间来引入书面文本的动态方面。使用文本时间线的定义，我们定义了单词出现的自相关函数（ACF），并证明了它在表示动态单词相关性和测量文本中单词重要性方面的实用性。在这项研究中，我们寻求一个随机过程来控制具有强动态相关性的给定单词的出现。这是有价值的，因为表现出强烈动态相关性的单词在发展或组织文本上下文中发挥着核心作用。在寻找这种随机过程时，我们发现加性二元马尔可夫链理论有助于描述强的动态单词相关性，因为它可以再现在实际书面文本中观察到的自协方差函数（ACFs的非规范化版本）的特征。利用这一理论，我们提出了一个时变概率模型，该模型描述了文本中每个句子中单词出现的概率。所提出的模型考虑了分层文档结构，如章节、小节、小节、段落和句子。由于这种层次结构在大多数文档中都很常见，因此我们的单词出现概率模型在解释实际书面文本中的动态单词相关性方面具有广泛的通用性。因此，本研究的主要贡献是发现了可加性二元马尔可夫链理论在分析书面文本中的动态相关性方面的可用性，并提供了一个新的单词出现概率模型，其中考虑了文档的常见层次结构。

{"title":"Origin of Dynamic Correlations of Words in Written Texts","authors":"Hiroshi Ogura, Hiromi Amano, Masato Kondo","doi":"10.4236/jdaip.2019.74014","DOIUrl":"https://doi.org/10.4236/jdaip.2019.74014","url":null,"abstract":"In a previous study, we introduced dynamical aspects of written texts by regarding serial sentence number from the first to last sentence of a given text as discretized time. Using this definition of a textual timeline, we defined an autocorrelation function (ACF) for word occurrences and demonstrated its utility both for representing dynamic word correlations and for measuring word importance within the text. In this study, we seek a stochastic process governing occurrences of a given word having strong dynamic correlations. This is valuable because words exhibiting strong dynamic correlations play a central role in developing or organizing textual contexts. While seeking this stochastic process, we find that additive binary Markov chain theory is useful for describing strong dynamic word correlations, in the sense that it can reproduce characteristics of autocovariance functions (an unnormalized version of ACFs) observed in actual written texts. Using this theory, we propose a model for time-varying probability that describes the probability of word occurrence in each sentence in a text. The proposed model considers hierarchical document structures such as chapters, sections, subsections, paragraphs, and sentences. Because such a hierarchical structure is common to most documents, our model for occurrence probability of words has a wide range of universality for interpreting dynamic word correlations in actual written texts. The main contributions of this study are, therefore, finding usability of the additive binary Markov chain theory to analyze dynamic correlations in written texts and offering a new model of word occurrence probability in which common hierarchical structure of documents is taken into account.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48590060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Blockchain in Smart Cities: Exploring Possibilities in Terms of Opportunities and Challenges 智慧城市中的区块链：从机遇和挑战的角度探索可能性

数据分析和信息处理(英文)

Pub Date : 2019-08-15 DOI: 10.4236/JDAIP.2019.73008

Raed A. Salha, Maher A. El-Hallaq, Abdelkhalek I. Alastal

The ultimate aim of a smart city is to enhance the quality of life for its residents and businesses through modern technologies in order to reduce resource deterioration and maintain overall costs. From this perspective, blockchain is one of these technologies that has received much attention during the recent years as it offers new alternatives for individuals and institutions in the smart city context. This study aims to explore the potential and contribution of blockchain in smart cities by studying and reviewing the literature of scientific research on the concept and fundamentals of blockchain, involving its most practical applications. In addition, it summarizes worldwide examples of success in using blockchain as well as exploring the challenges and opportunities related to this technology in smart cities. Thus, this study provides a useful reference for researchers to review all about the new blockchain technology.

智慧城市的最终目标是通过现代技术提高居民和企业的生活质量，以减少资源恶化并保持总体成本。从这个角度来看，区块链是近年来备受关注的技术之一，因为它为智能城市背景下的个人和机构提供了新的替代方案。本研究旨在通过研究和回顾区块链概念和基本原理的科学研究文献，探讨区块链在智慧城市中的潜力和贡献，涉及其最实际的应用。此外，它总结了世界范围内成功使用区块链的例子，并探讨了与这项技术在智能城市中相关的挑战和机遇。因此，本研究为研究人员回顾区块链新技术提供了有益的参考。

引用次数: 24

Time Series Analysis of Energy Intensity, Value Added Tax and Corporate Income Tax: A Case Study of the Non-Ferrous Metal Industry, Jiangxi Province, China 能源强度、增值税和企业所得税的时间序列分析——以江西有色金属工业为例

数据分析和信息处理(英文)

Pub Date : 2019-07-26 DOI: 10.4236/JDAIP.2019.73007

Wen-rong Pan, D. Lai, Yu Song, J. Follis

Unprecedented industrialization and urbanization have led to China’s poor energy efficiency. In response, the Chinese government has set goals to reduce energy consumption that may include implementing new tax policies. In this paper, we investigate the relationship between energy intensity, an indicator that measures the efficiency of energy consumption, and two sources of government revenue in China (i.e., value-added tax (VAT) and corporate income tax). As a case study, we developed a Granger co-integration model to analyze the dynamic relationship of energy intensity, VAT and corporate income tax in the non-ferrous metal industry, Jiangxi Province, China, between 1996 and 2010. Augmented Dickey-Fuller tests were used to validate the model. In our time series analyses, we found when controlling for corporate income tax, a one log unit increase of VAT resulted in a decrease of 1.17 log units of energy intensity. However, when controlling for VAT, a one log unit increase of corporate income tax resulted in an increase of 0.34 log units of energy intensity. Understanding the relationship between energy intensity and taxation in industries that consume high volumes of energy can greatly enhance China’s goal to reduce energy consumption. We believe our findings add to this on-going discussion.

前所未有的工业化和城市化导致中国能源效率低下。作为回应，中国政府制定了减少能源消耗的目标，其中可能包括实施新的税收政策。在本文中，我们研究了衡量能源消费效率的指标能源强度与中国两种政府收入来源（即增值税和企业所得税）之间的关系。作为案例研究，我们开发了Granger协整模型来分析1996-2010年间中国江西省有色金属行业能源强度、增值税和企业所得税的动态关系。使用增强Dickey-Fuller测试来验证该模型。在我们的时间序列分析中，我们发现在控制企业所得税时，增值税增加一个对数单位会导致能源强度降低1.17个对数单位。然而，在控制增值税时，企业所得税增加一个对数单位会导致能源强度增加0.34个对数单位。了解高耗能行业的能源强度和税收之间的关系，可以大大提高中国降低能源消耗的目标。我们相信，我们的研究结果为正在进行的讨论锦上添花。

{"title":"Time Series Analysis of Energy Intensity, Value Added Tax and Corporate Income Tax: A Case Study of the Non-Ferrous Metal Industry, Jiangxi Province, China","authors":"Wen-rong Pan, D. Lai, Yu Song, J. Follis","doi":"10.4236/JDAIP.2019.73007","DOIUrl":"https://doi.org/10.4236/JDAIP.2019.73007","url":null,"abstract":"Unprecedented industrialization and urbanization have led to China’s poor energy efficiency. In response, the Chinese government has set goals to reduce energy consumption that may include implementing new tax policies. In this paper, we investigate the relationship between energy intensity, an indicator that measures the efficiency of energy consumption, and two sources of government revenue in China (i.e., value-added tax (VAT) and corporate income tax). As a case study, we developed a Granger co-integration model to analyze the dynamic relationship of energy intensity, VAT and corporate income tax in the non-ferrous metal industry, Jiangxi Province, China, between 1996 and 2010. Augmented Dickey-Fuller tests were used to validate the model. In our time series analyses, we found when controlling for corporate income tax, a one log unit increase of VAT resulted in a decrease of 1.17 log units of energy intensity. However, when controlling for VAT, a one log unit increase of corporate income tax resulted in an increase of 0.34 log units of energy intensity. Understanding the relationship between energy intensity and taxation in industries that consume high volumes of energy can greatly enhance China’s goal to reduce energy consumption. We believe our findings add to this on-going discussion.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49344836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Road Traffic Accident Scenario, Pattern and Forecasting in Bangladesh 孟加拉国道路交通事故情景、模式和预测

数据分析和信息处理(英文)

Pub Date : 2019-03-04 DOI: 10.4236/JDAIP.2019.72003

S. Hossain, Omor Faruque

The main aim of this research work is to be aware of the road traffic accident scenario, injurious effects and pattern in Bangladesh. Moreover we are interested to forecast the magnitude of road traffic accidents for the future so that decision makers can make appropriate decision for precaution. This study also provides an assessment of road traffic accidents in Bangladesh and its impact based on data collected for the period of 1971 to 2017. In this study we have tried to pick up the main reasons of road accidents and to observe the tremendous situation. The study observed that the general trends of road traffic accident (RTA), deaths and injuries reveal that the number of RTA, deaths and injuries increased gradually with little fluctuations form 1971 to 2007 and after 2007 there is a slow decreasing trend. Although the number of RTA and deaths observed decreasing trend in recent years, the ratio of number of deaths to number of accident increased significantly. The rate of register vehicles per 10,000 people increased moderately throughout the period but a sharp increment is exhibited from 2009. Highest percentage of RTA (34%) and deaths is due to RTA (32%) in Dhaka division while the lowest percentage of RTA (4%) in Barisal and Sylhet divisions and deaths is due to RTA (3%) in Barisal division. It is noticed that the maximum number of injuries occurred between ages 21 and 30 while the maximum number of deaths occurred between ages 11 and 30. Most of the RTA and deaths due to RTA are caused by run over by vehicles and head to head collision. The severity of occurring road accident and number of deaths are higher during the festive periods because of involving higher frequency of traveling than usual. The time plot shows that the graph maintains a decreasing movement from 2012 to 2015 but increases from 2015 to 2017. In the research an additive time series model approach is applied. It included the estimation of trend, seasonal variation and random variation using triple exponential smoothing method. We performed forecasting of RTA eliminating seasonal impact for the next three consecutive years (2018-2020) with 95% confidence interval using Holt-Winters exponential technique.

本研究工作的主要目的是了解孟加拉国的道路交通事故场景、伤害影响和模式。此外，我们有兴趣预测未来道路交通事故的严重程度，以便决策者做出适当的预防决策。本研究还根据1971年至2017年期间收集的数据，对孟加拉国的道路交通事故及其影响进行了评估。在这项研究中，我们试图找出道路事故的主要原因，并观察其巨大的情况。研究发现，道路交通事故（RTA）、死亡和伤害的总体趋势表明，1971年至2007年，RTA、死亡和受伤的数量逐渐增加，波动较小，2007年后呈缓慢下降趋势。尽管近年来RTA和死亡人数呈下降趋势，但死亡人数与事故人数的比率显著上升。在此期间，每10000人登记车辆的比率略有上升，但从2009年开始急剧上升。达卡分区的RTA（34%）和死亡比例最高，而巴里萨尔和锡尔赫特分区的RTA%（4%）和死亡百分比最低，是由于巴里萨尔分区的RTA（3%）。值得注意的是，受伤人数最多的发生在21岁至30岁之间，而死亡人数最多的出现在11岁至30之间。大多数RTA和RTA造成的死亡是由车辆碾压和头对头碰撞造成的。节日期间发生道路事故的严重程度和死亡人数更高，因为出行频率比平时高。时间图显示，该图从2012年到2015年保持下降趋势，但从2015年到2017年增加。在研究中，应用了一种加性时间序列模型方法。它包括使用三指数平滑方法估计趋势、季节变化和随机变化。我们使用Holt-Winters指数技术对未来三年（2018-2020年）消除季节性影响的RTA进行了预测，置信区间为95%。

{"title":"Road Traffic Accident Scenario, Pattern and Forecasting in Bangladesh","authors":"S. Hossain, Omor Faruque","doi":"10.4236/JDAIP.2019.72003","DOIUrl":"https://doi.org/10.4236/JDAIP.2019.72003","url":null,"abstract":"The main aim of this research work is to be aware of the road traffic accident scenario, injurious effects and pattern in Bangladesh. Moreover we are interested to forecast the magnitude of road traffic accidents for the future so that decision makers can make appropriate decision for precaution. This study also provides an assessment of road traffic accidents in Bangladesh and its impact based on data collected for the period of 1971 to 2017. In this study we have tried to pick up the main reasons of road accidents and to observe the tremendous situation. The study observed that the general trends of road traffic accident (RTA), deaths and injuries reveal that the number of RTA, deaths and injuries increased gradually with little fluctuations form 1971 to 2007 and after 2007 there is a slow decreasing trend. Although the number of RTA and deaths observed decreasing trend in recent years, the ratio of number of deaths to number of accident increased significantly. The rate of register vehicles per 10,000 people increased moderately throughout the period but a sharp increment is exhibited from 2009. Highest percentage of RTA (34%) and deaths is due to RTA (32%) in Dhaka division while the lowest percentage of RTA (4%) in Barisal and Sylhet divisions and deaths is due to RTA (3%) in Barisal division. It is noticed that the maximum number of injuries occurred between ages 21 and 30 while the maximum number of deaths occurred between ages 11 and 30. Most of the RTA and deaths due to RTA are caused by run over by vehicles and head to head collision. The severity of occurring road accident and number of deaths are higher during the festive periods because of involving higher frequency of traveling than usual. The time plot shows that the graph maintains a decreasing movement from 2012 to 2015 but increases from 2015 to 2017. In the research an additive time series model approach is applied. It included the estimation of trend, seasonal variation and random variation using triple exponential smoothing method. We performed forecasting of RTA eliminating seasonal impact for the next three consecutive years (2018-2020) with 95% confidence interval using Holt-Winters exponential technique.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48679248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Measuring Dynamic Correlations of Words in Written Texts with an Autocorrelation Function 用自相关函数测量文字动态相关性

数据分析和信息处理(英文)

Pub Date : 2019-03-04 DOI: 10.4236/JDAIP.2019.72004

Hiroshi Ogura, Hiromi Amano, Masato Kondo

In this study, we regard written texts as time series data and try to investigate dynamic correlations of word occurrences by utilizing an autocorrelation function (ACF). After defining appropriate formula for the ACF that is suitable for expressing the dynamic correlations of words, we use the formula to calculate ACFs for frequent words in 12 books. The ACFs obtained can be classified into two groups: One group of ACFs shows dynamic correlations, with these ACFs well described by a modified Kohlrausch-Williams-Watts (KWW) function; the other group of ACFs shows no correlations, with these ACFs fitted by a simple stepdown function. A word having the former ACF is called a Type-I word and a word with the latter ACF is called a Type-II word. It is also shown that the ACFs of Type-II words can be derived theoretically by assuming that the stochastic process governing word occurrence is a homogeneous Poisson point process. Based on the fitting of the ACFs by KWW and stepdown functions, we propose a measure of word importance which expresses the extent to which a word is important in a particular text. The validity of the measure is confirmed by using the Kleinburg’s burst detection algorithm.

在本研究中，我们将书面文本视为时间序列数据，并试图利用自相关函数(ACF)来研究单词出现的动态相关性。在定义了适合表达单词动态相关性的ACF公式后，我们使用该公式计算了12本书中频繁单词的ACF。所得的ACFs可分为两类:一类ACFs表现出动态相关性，这些ACFs由修正的Kohlrausch-Williams-Watts (KWW)函数很好地描述;另一组ACFs没有表现出相关性，这些ACFs由一个简单的降压函数拟合。具有前一种ACF的单词称为第一类单词，具有后一种ACF的单词称为第二类单词。假设控制词出现的随机过程是齐次泊松点过程，可以从理论上推导出二类词的ACFs。基于KWW和阶跃函数对ACFs的拟合，我们提出了一个词重要性的度量，它表达了一个词在特定文本中的重要程度。利用Kleinburg突发检测算法验证了该方法的有效性。

{"title":"Measuring Dynamic Correlations of Words in Written Texts with an Autocorrelation Function","authors":"Hiroshi Ogura, Hiromi Amano, Masato Kondo","doi":"10.4236/JDAIP.2019.72004","DOIUrl":"https://doi.org/10.4236/JDAIP.2019.72004","url":null,"abstract":"In this study, we regard written texts as time series data and try to investigate dynamic correlations of word occurrences by utilizing an autocorrelation function (ACF). After defining appropriate formula for the ACF that is suitable for expressing the dynamic correlations of words, we use the formula to calculate ACFs for frequent words in 12 books. The ACFs obtained can be classified into two groups: One group of ACFs shows dynamic correlations, with these ACFs well described by a modified Kohlrausch-Williams-Watts (KWW) function; the other group of ACFs shows no correlations, with these ACFs fitted by a simple stepdown function. A word having the former ACF is called a Type-I word and a word with the latter ACF is called a Type-II word. It is also shown that the ACFs of Type-II words can be derived theoretically by assuming that the stochastic process governing word occurrence is a homogeneous Poisson point process. Based on the fitting of the ACFs by KWW and stepdown functions, we propose a measure of word importance which expresses the extent to which a word is important in a particular text. The validity of the measure is confirmed by using the Kleinburg’s burst detection algorithm.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44564790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Technology Foresight Research of Industrial Robot Based on Patent Analysis 基于专利分析的工业机器人技术前瞻研究

数据分析和信息处理(英文)

Pub Date : 2019-03-04 DOI: 10.4236/JDAIP.2019.72005

Xionghui Wen

In view of the lack of patent big data in research on technology foresight in the industrial robot field, this paper introduces an improved method based on patent mining and knowledge map. Firstly, SAO structure is extracted from selected patents, secondly, the similarity between patents is calculated based on extracted SAO structure, thirdly, patent network and patent map are drawn based on calculated patent similarity matrix, technology evolution process and future trends of industrial robot are summarized from patent network, and future potential technology opportunities are predicted based on technological vacancies identified from patent map. Finally, this paper identifies six key technical areas and four potential technical opportunities in the field of the industrial robot.

针对工业机器人领域技术预测研究缺乏专利大数据的问题，提出了一种基于专利挖掘和知识地图的改进方法。首先从选定的专利中提取SAO结构，然后根据提取的SAO结构计算专利之间的相似度，然后根据计算的专利相似度矩阵绘制专利网络和专利地图，从专利网络中总结工业机器人的技术演变过程和未来趋势，并根据专利地图中识别的技术空缺预测未来潜在的技术机会。最后，本文确定了工业机器人领域的六个关键技术领域和四个潜在的技术机遇。

引用次数: 3

Ordinal Outcome Modeling: The Application of the Adaptive Moment Estimation Optimizer to the Elastic Net Penalized Stereotype Logit 有序结果建模:自适应矩估计优化器在弹性网惩罚刻板印象Logit中的应用

数据分析和信息处理(英文)

Pub Date : 2019-02-22 DOI: 10.4236/jdaip.2019.71002

A. Williams

Penalized ordinal outcome models were developed to model high dimensional data with ordinal outcomes. One option is the penalized stereotype logit, which includes nonlinear combinations of parameter estimates. Optimization algorithms assuming linearity and function convexity were applied to fit this model. In this study the application of the adaptive moment estimation (Adam) optimizer, suited for nonlinear optimization, to the elastic net penalized stereotype logit model is proposed. The proposed model is compared to the L1 penalized ordinalgmifs stereotype model. Both methods were applied to simulated and real data, with non-Hodgkin lymphoma (NHL) cancer subtypes as the outcome, with results presented and discussed.

惩罚有序结果模型被开发用于对具有有序结果的高维数据进行建模。一种选择是惩罚刻板印象logit，它包括参数估计的非线性组合。采用假设线性和函数凸性的优化算法来拟合该模型。在本研究中，提出了适用于非线性优化的自适应矩估计（Adam）优化器在弹性网络惩罚刻板印象logit模型中的应用。将所提出的模型与L1惩罚的ordinalgifs刻板印象模型进行了比较。这两种方法都应用于模拟和真实数据，以非霍奇金淋巴瘤（NHL）癌症亚型为结果，提出并讨论了结果。

引用次数: 1