Pub Date : 2019-09-12DOI: 10.4236/jdaip.2019.74009
M. K. Peter, L. Mbugua, A. Wanjoya
The effect of treatment on patient’s outcome can easily be determined through the impact of the treatment on biological events. Observing the treatment for patients for a certain period of time can help in determining whether there is any change in the biomarker of the patient. It is important to study how the biomarker changes due to treatment and whether for different individuals located in separate centers can be clustered together since they might have different distributions. The study is motivated by a Bayesian non-parametric mixture model, which is more flexible when compared to the Bayesian Parametric models and is capable of borrowing information across different centers allowing them to be grouped together. To this end, this research modeled Biological markers taking into consideration the Surrogate markers. The study employed the nested Dirichlet process prior, which is easily peaceable on different distributions for several centers, with centers from the same Dirichlet process component clustered automatically together. The study sampled from the posterior by use of Markov chain Monte carol algorithm. The model is illustrated using a simulation study to see how it performs on simulated data. Clearly, from the simulation study it was clear that, the model was capable of clustering data into different clusters.
{"title":"Bayesian Non-Parametric Mixture Model with Application to Modeling Biological Markers","authors":"M. K. Peter, L. Mbugua, A. Wanjoya","doi":"10.4236/jdaip.2019.74009","DOIUrl":"https://doi.org/10.4236/jdaip.2019.74009","url":null,"abstract":"The effect of treatment on patient’s outcome can easily be determined through the impact of the treatment on biological events. Observing the treatment for patients for a certain period of time can help in determining whether there is any change in the biomarker of the patient. It is important to study how the biomarker changes due to treatment and whether for different individuals located in separate centers can be clustered together since they might have different distributions. The study is motivated by a Bayesian non-parametric mixture model, which is more flexible when compared to the Bayesian Parametric models and is capable of borrowing information across different centers allowing them to be grouped together. To this end, this research modeled Biological markers taking into consideration the Surrogate markers. The study employed the nested Dirichlet process prior, which is easily peaceable on different distributions for several centers, with centers from the same Dirichlet process component clustered automatically together. The study sampled from the posterior by use of Markov chain Monte carol algorithm. The model is illustrated using a simulation study to see how it performs on simulated data. Clearly, from the simulation study it was clear that, the model was capable of clustering data into different clusters.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41262284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-12DOI: 10.4236/jdaip.2019.74015
Benson Kituku, Wanjiku Ng'ang'a, Lawrence Muchemi
The under-resourced Kikamba language has few language technology tools since the more efficient and popular data driven approaches for developing them suffer from data sparseness due to lack of digitized corpora. To address this challenge, we have developed a computational grammar for the Kikamba language within the multilingual Grammatical Framework (GF) toolkit. GF uses the Interlingua rule-based translation approach. To develop the grammar, we used the morphology driven strategy. Therefore, we first developed regular expressions for morphology inflection and thereafter developed the syntax rules. Evaluation of the grammar was done using one hundred sentences in both English and Kikamba languages. The results were an encouraging four n-gram BLEU score of 83.05% and the Position independent error rate (PER) of 10.96%. Finally, we have made a contribution to the language technology resources for Kikamba including multilingual machine translation, a morphology analyzer, a computational grammar which provides a platform for development of multilingual applications and the ability to generate a variety of bilingual corpora for Kikamba for all languages currently defined in GF, making it easier to experiment with data driven approaches.
{"title":"Towards Kikamba Computational Grammar","authors":"Benson Kituku, Wanjiku Ng'ang'a, Lawrence Muchemi","doi":"10.4236/jdaip.2019.74015","DOIUrl":"https://doi.org/10.4236/jdaip.2019.74015","url":null,"abstract":"The under-resourced Kikamba language has few language technology tools since the more efficient and popular data driven approaches for developing them suffer from data sparseness due to lack of digitized corpora. To address this challenge, we have developed a computational grammar for the Kikamba language within the multilingual Grammatical Framework (GF) toolkit. GF uses the Interlingua rule-based translation approach. To develop the grammar, we used the morphology driven strategy. Therefore, we first developed regular expressions for morphology inflection and thereafter developed the syntax rules. Evaluation of the grammar was done using one hundred sentences in both English and Kikamba languages. The results were an encouraging four n-gram BLEU score of 83.05% and the Position independent error rate (PER) of 10.96%. Finally, we have made a contribution to the language technology resources for Kikamba including multilingual machine translation, a morphology analyzer, a computational grammar which provides a platform for development of multilingual applications and the ability to generate a variety of bilingual corpora for Kikamba for all languages currently defined in GF, making it easier to experiment with data driven approaches.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47635837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-12DOI: 10.4236/jdaip.2019.74016
Ioannis Karamitsos, Saeed Albarhami, Charalampos Apostolopoulos
The availability and advancements of cloud computing service models such as IaaS, SaaS, and PaaS; introducing on-demand self-service, auto scaling, easy maintenance, and pay as you go, has dramatically transformed the way organizations design and operate their datacenters. However, some organizations still have many concerns like: security, governance, lack of expertise, and migration. The purpose of this paper is to discuss the cloud computing customers’ opinions, feedbacks, attitudes, and emotions towards cloud computing services using sentiment analysis. The associated aim, is to help people and organizations to understand the benefits and challenges of cloud services from the general public’s perspective view as well as opinions about existing cloud providers, focusing on three main cloud providers: Azure, Amazon Web Services (AWS) and Google Cloud. The methodology used in this paper is based on sentiment analysis applied to the tweets that were extracted from social media platform (Twitter) via its search API. We have extracted a sample of 11,000 tweets and each cloud provider has almost similar proportion of the tweets based on relevant hashtags and keywords. Analysis starts by combining the tweets in order to find the overall polarity about cloud computing, then breaking the tweets to find the specific polarity for each cloud provider. Bing and NRC Lexicons are employed to measure the polarity and emotion of the terms in the tweets. The overall polarity classification of the tweets across all cloud providers shows 68.5% positive and 31.5% negative percentages. More specifically, Azure shows 63.8% positive and 36.2% negative tweets, Google Cloud shows 72.6% positive and 27.4% negative tweets and AWS shows 69.1% positive and 30.9% negative tweets.
{"title":"Tweet Sentiment Analysis (TSA) for Cloud Providers Using Classification Algorithms and Latent Semantic Analysis","authors":"Ioannis Karamitsos, Saeed Albarhami, Charalampos Apostolopoulos","doi":"10.4236/jdaip.2019.74016","DOIUrl":"https://doi.org/10.4236/jdaip.2019.74016","url":null,"abstract":"The availability and advancements of cloud computing service models such as IaaS, SaaS, and PaaS; introducing on-demand self-service, auto scaling, easy maintenance, and pay as you go, has dramatically transformed the way organizations design and operate their datacenters. However, some organizations still have many concerns like: security, governance, lack of expertise, and migration. The purpose of this paper is to discuss the cloud computing customers’ opinions, feedbacks, attitudes, and emotions towards cloud computing services using sentiment analysis. The associated aim, is to help people and organizations to understand the benefits and challenges of cloud services from the general public’s perspective view as well as opinions about existing cloud providers, focusing on three main cloud providers: Azure, Amazon Web Services (AWS) and Google Cloud. The methodology used in this paper is based on sentiment analysis applied to the tweets that were extracted from social media platform (Twitter) via its search API. We have extracted a sample of 11,000 tweets and each cloud provider has almost similar proportion of the tweets based on relevant hashtags and keywords. Analysis starts by combining the tweets in order to find the overall polarity about cloud computing, then breaking the tweets to find the specific polarity for each cloud provider. Bing and NRC Lexicons are employed to measure the polarity and emotion of the terms in the tweets. The overall polarity classification of the tweets across all cloud providers shows 68.5% positive and 31.5% negative percentages. More specifically, Azure shows 63.8% positive and 36.2% negative tweets, Google Cloud shows 72.6% positive and 27.4% negative tweets and AWS shows 69.1% positive and 30.9% negative tweets.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43994825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-12DOI: 10.4236/jdaip.2019.74014
Hiroshi Ogura, Hiromi Amano, Masato Kondo
In a previous study, we introduced dynamical aspects of written texts by regarding serial sentence number from the first to last sentence of a given text as discretized time. Using this definition of a textual timeline, we defined an autocorrelation function (ACF) for word occurrences and demonstrated its utility both for representing dynamic word correlations and for measuring word importance within the text. In this study, we seek a stochastic process governing occurrences of a given word having strong dynamic correlations. This is valuable because words exhibiting strong dynamic correlations play a central role in developing or organizing textual contexts. While seeking this stochastic process, we find that additive binary Markov chain theory is useful for describing strong dynamic word correlations, in the sense that it can reproduce characteristics of autocovariance functions (an unnormalized version of ACFs) observed in actual written texts. Using this theory, we propose a model for time-varying probability that describes the probability of word occurrence in each sentence in a text. The proposed model considers hierarchical document structures such as chapters, sections, subsections, paragraphs, and sentences. Because such a hierarchical structure is common to most documents, our model for occurrence probability of words has a wide range of universality for interpreting dynamic word correlations in actual written texts. The main contributions of this study are, therefore, finding usability of the additive binary Markov chain theory to analyze dynamic correlations in written texts and offering a new model of word occurrence probability in which common hierarchical structure of documents is taken into account.
{"title":"Origin of Dynamic Correlations of Words in Written Texts","authors":"Hiroshi Ogura, Hiromi Amano, Masato Kondo","doi":"10.4236/jdaip.2019.74014","DOIUrl":"https://doi.org/10.4236/jdaip.2019.74014","url":null,"abstract":"In a previous study, we introduced dynamical aspects of written texts by regarding serial sentence number from the first to last sentence of a given text as discretized time. Using this definition of a textual timeline, we defined an autocorrelation function (ACF) for word occurrences and demonstrated its utility both for representing dynamic word correlations and for measuring word importance within the text. In this study, we seek a stochastic process governing occurrences of a given word having strong dynamic correlations. This is valuable because words exhibiting strong dynamic correlations play a central role in developing or organizing textual contexts. While seeking this stochastic process, we find that additive binary Markov chain theory is useful for describing strong dynamic word correlations, in the sense that it can reproduce characteristics of autocovariance functions (an unnormalized version of ACFs) observed in actual written texts. Using this theory, we propose a model for time-varying probability that describes the probability of word occurrence in each sentence in a text. The proposed model considers hierarchical document structures such as chapters, sections, subsections, paragraphs, and sentences. Because such a hierarchical structure is common to most documents, our model for occurrence probability of words has a wide range of universality for interpreting dynamic word correlations in actual written texts. The main contributions of this study are, therefore, finding usability of the additive binary Markov chain theory to analyze dynamic correlations in written texts and offering a new model of word occurrence probability in which common hierarchical structure of documents is taken into account.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48590060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-08-15DOI: 10.4236/JDAIP.2019.73008
Raed A. Salha, Maher A. El-Hallaq, Abdelkhalek I. Alastal
The ultimate aim of a smart city is to enhance the quality of life for its residents and businesses through modern technologies in order to reduce resource deterioration and maintain overall costs. From this perspective, blockchain is one of these technologies that has received much attention during the recent years as it offers new alternatives for individuals and institutions in the smart city context. This study aims to explore the potential and contribution of blockchain in smart cities by studying and reviewing the literature of scientific research on the concept and fundamentals of blockchain, involving its most practical applications. In addition, it summarizes worldwide examples of success in using blockchain as well as exploring the challenges and opportunities related to this technology in smart cities. Thus, this study provides a useful reference for researchers to review all about the new blockchain technology.
{"title":"Blockchain in Smart Cities: Exploring Possibilities in Terms of Opportunities and Challenges","authors":"Raed A. Salha, Maher A. El-Hallaq, Abdelkhalek I. Alastal","doi":"10.4236/JDAIP.2019.73008","DOIUrl":"https://doi.org/10.4236/JDAIP.2019.73008","url":null,"abstract":"The ultimate aim of a smart city is to enhance the quality of life for its residents and businesses through modern technologies in order to reduce resource deterioration and maintain overall costs. From this perspective, blockchain is one of these technologies that has received much attention during the recent years as it offers new alternatives for individuals and institutions in the smart city context. This study aims to explore the potential and contribution of blockchain in smart cities by studying and reviewing the literature of scientific research on the concept and fundamentals of blockchain, involving its most practical applications. In addition, it summarizes worldwide examples of success in using blockchain as well as exploring the challenges and opportunities related to this technology in smart cities. Thus, this study provides a useful reference for researchers to review all about the new blockchain technology.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48575866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-26DOI: 10.4236/JDAIP.2019.73007
Wen-rong Pan, D. Lai, Yu Song, J. Follis
Unprecedented industrialization and urbanization have led to China’s poor energy efficiency. In response, the Chinese government has set goals to reduce energy consumption that may include implementing new tax policies. In this paper, we investigate the relationship between energy intensity, an indicator that measures the efficiency of energy consumption, and two sources of government revenue in China (i.e., value-added tax (VAT) and corporate income tax). As a case study, we developed a Granger co-integration model to analyze the dynamic relationship of energy intensity, VAT and corporate income tax in the non-ferrous metal industry, Jiangxi Province, China, between 1996 and 2010. Augmented Dickey-Fuller tests were used to validate the model. In our time series analyses, we found when controlling for corporate income tax, a one log unit increase of VAT resulted in a decrease of 1.17 log units of energy intensity. However, when controlling for VAT, a one log unit increase of corporate income tax resulted in an increase of 0.34 log units of energy intensity. Understanding the relationship between energy intensity and taxation in industries that consume high volumes of energy can greatly enhance China’s goal to reduce energy consumption. We believe our findings add to this on-going discussion.
{"title":"Time Series Analysis of Energy Intensity, Value Added Tax and Corporate Income Tax: A Case Study of the Non-Ferrous Metal Industry, Jiangxi Province, China","authors":"Wen-rong Pan, D. Lai, Yu Song, J. Follis","doi":"10.4236/JDAIP.2019.73007","DOIUrl":"https://doi.org/10.4236/JDAIP.2019.73007","url":null,"abstract":"Unprecedented industrialization and urbanization have led to China’s poor energy efficiency. In response, the Chinese government has set goals to reduce energy consumption that may include implementing new tax policies. In this paper, we investigate the relationship between energy intensity, an indicator that measures the efficiency of energy consumption, and two sources of government revenue in China (i.e., value-added tax (VAT) and corporate income tax). As a case study, we developed a Granger co-integration model to analyze the dynamic relationship of energy intensity, VAT and corporate income tax in the non-ferrous metal industry, Jiangxi Province, China, between 1996 and 2010. Augmented Dickey-Fuller tests were used to validate the model. In our time series analyses, we found when controlling for corporate income tax, a one log unit increase of VAT resulted in a decrease of 1.17 log units of energy intensity. However, when controlling for VAT, a one log unit increase of corporate income tax resulted in an increase of 0.34 log units of energy intensity. Understanding the relationship between energy intensity and taxation in industries that consume high volumes of energy can greatly enhance China’s goal to reduce energy consumption. We believe our findings add to this on-going discussion.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49344836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-04DOI: 10.4236/JDAIP.2019.72003
S. Hossain, Omor Faruque
The main aim of this research work is to be aware of the road traffic accident scenario, injurious effects and pattern in Bangladesh. Moreover we are interested to forecast the magnitude of road traffic accidents for the future so that decision makers can make appropriate decision for precaution. This study also provides an assessment of road traffic accidents in Bangladesh and its impact based on data collected for the period of 1971 to 2017. In this study we have tried to pick up the main reasons of road accidents and to observe the tremendous situation. The study observed that the general trends of road traffic accident (RTA), deaths and injuries reveal that the number of RTA, deaths and injuries increased gradually with little fluctuations form 1971 to 2007 and after 2007 there is a slow decreasing trend. Although the number of RTA and deaths observed decreasing trend in recent years, the ratio of number of deaths to number of accident increased significantly. The rate of register vehicles per 10,000 people increased moderately throughout the period but a sharp increment is exhibited from 2009. Highest percentage of RTA (34%) and deaths is due to RTA (32%) in Dhaka division while the lowest percentage of RTA (4%) in Barisal and Sylhet divisions and deaths is due to RTA (3%) in Barisal division. It is noticed that the maximum number of injuries occurred between ages 21 and 30 while the maximum number of deaths occurred between ages 11 and 30. Most of the RTA and deaths due to RTA are caused by run over by vehicles and head to head collision. The severity of occurring road accident and number of deaths are higher during the festive periods because of involving higher frequency of traveling than usual. The time plot shows that the graph maintains a decreasing movement from 2012 to 2015 but increases from 2015 to 2017. In the research an additive time series model approach is applied. It included the estimation of trend, seasonal variation and random variation using triple exponential smoothing method. We performed forecasting of RTA eliminating seasonal impact for the next three consecutive years (2018-2020) with 95% confidence interval using Holt-Winters exponential technique.
{"title":"Road Traffic Accident Scenario, Pattern and Forecasting in Bangladesh","authors":"S. Hossain, Omor Faruque","doi":"10.4236/JDAIP.2019.72003","DOIUrl":"https://doi.org/10.4236/JDAIP.2019.72003","url":null,"abstract":"The main aim of this research work is to be aware of the road traffic accident scenario, injurious effects and pattern in Bangladesh. Moreover we are interested to forecast the magnitude of road traffic accidents for the future so that decision makers can make appropriate decision for precaution. This study also provides an assessment of road traffic accidents in Bangladesh and its impact based on data collected for the period of 1971 to 2017. In this study we have tried to pick up the main reasons of road accidents and to observe the tremendous situation. The study observed that the general trends of road traffic accident (RTA), deaths and injuries reveal that the number of RTA, deaths and injuries increased gradually with little fluctuations form 1971 to 2007 and after 2007 there is a slow decreasing trend. Although the number of RTA and deaths observed decreasing trend in recent years, the ratio of number of deaths to number of accident increased significantly. The rate of register vehicles per 10,000 people increased moderately throughout the period but a sharp increment is exhibited from 2009. Highest percentage of RTA (34%) and deaths is due to RTA (32%) in Dhaka division while the lowest percentage of RTA (4%) in Barisal and Sylhet divisions and deaths is due to RTA (3%) in Barisal division. It is noticed that the maximum number of injuries occurred between ages 21 and 30 while the maximum number of deaths occurred between ages 11 and 30. Most of the RTA and deaths due to RTA are caused by run over by vehicles and head to head collision. The severity of occurring road accident and number of deaths are higher during the festive periods because of involving higher frequency of traveling than usual. The time plot shows that the graph maintains a decreasing movement from 2012 to 2015 but increases from 2015 to 2017. In the research an additive time series model approach is applied. It included the estimation of trend, seasonal variation and random variation using triple exponential smoothing method. We performed forecasting of RTA eliminating seasonal impact for the next three consecutive years (2018-2020) with 95% confidence interval using Holt-Winters exponential technique.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48679248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-04DOI: 10.4236/JDAIP.2019.72004
Hiroshi Ogura, Hiromi Amano, Masato Kondo
In this study, we regard written texts as time series data and try to investigate dynamic correlations of word occurrences by utilizing an autocorrelation function (ACF). After defining appropriate formula for the ACF that is suitable for expressing the dynamic correlations of words, we use the formula to calculate ACFs for frequent words in 12 books. The ACFs obtained can be classified into two groups: One group of ACFs shows dynamic correlations, with these ACFs well described by a modified Kohlrausch-Williams-Watts (KWW) function; the other group of ACFs shows no correlations, with these ACFs fitted by a simple stepdown function. A word having the former ACF is called a Type-I word and a word with the latter ACF is called a Type-II word. It is also shown that the ACFs of Type-II words can be derived theoretically by assuming that the stochastic process governing word occurrence is a homogeneous Poisson point process. Based on the fitting of the ACFs by KWW and stepdown functions, we propose a measure of word importance which expresses the extent to which a word is important in a particular text. The validity of the measure is confirmed by using the Kleinburg’s burst detection algorithm.
{"title":"Measuring Dynamic Correlations of Words in Written Texts with an Autocorrelation Function","authors":"Hiroshi Ogura, Hiromi Amano, Masato Kondo","doi":"10.4236/JDAIP.2019.72004","DOIUrl":"https://doi.org/10.4236/JDAIP.2019.72004","url":null,"abstract":"In this study, we regard written texts as time series data and try to investigate dynamic correlations of word occurrences by utilizing an autocorrelation function (ACF). After defining appropriate formula for the ACF that is suitable for expressing the dynamic correlations of words, we use the formula to calculate ACFs for frequent words in 12 books. The ACFs obtained can be classified into two groups: One group of ACFs shows dynamic correlations, with these ACFs well described by a modified Kohlrausch-Williams-Watts (KWW) function; the other group of ACFs shows no correlations, with these ACFs fitted by a simple stepdown function. A word having the former ACF is called a Type-I word and a word with the latter ACF is called a Type-II word. It is also shown that the ACFs of Type-II words can be derived theoretically by assuming that the stochastic process governing word occurrence is a homogeneous Poisson point process. Based on the fitting of the ACFs by KWW and stepdown functions, we propose a measure of word importance which expresses the extent to which a word is important in a particular text. The validity of the measure is confirmed by using the Kleinburg’s burst detection algorithm.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44564790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-04DOI: 10.4236/JDAIP.2019.72005
Xionghui Wen
In view of the lack of patent big data in research on technology foresight in the industrial robot field, this paper introduces an improved method based on patent mining and knowledge map. Firstly, SAO structure is extracted from selected patents, secondly, the similarity between patents is calculated based on extracted SAO structure, thirdly, patent network and patent map are drawn based on calculated patent similarity matrix, technology evolution process and future trends of industrial robot are summarized from patent network, and future potential technology opportunities are predicted based on technological vacancies identified from patent map. Finally, this paper identifies six key technical areas and four potential technical opportunities in the field of the industrial robot.
{"title":"Technology Foresight Research of Industrial Robot Based on Patent Analysis","authors":"Xionghui Wen","doi":"10.4236/JDAIP.2019.72005","DOIUrl":"https://doi.org/10.4236/JDAIP.2019.72005","url":null,"abstract":"In view of the lack of patent big data in research on technology foresight in the industrial robot field, this paper introduces an improved method based on patent mining and knowledge map. Firstly, SAO structure is extracted from selected patents, secondly, the similarity between patents is calculated based on extracted SAO structure, thirdly, patent network and patent map are drawn based on calculated patent similarity matrix, technology evolution process and future trends of industrial robot are summarized from patent network, and future potential technology opportunities are predicted based on technological vacancies identified from patent map. Finally, this paper identifies six key technical areas and four potential technical opportunities in the field of the industrial robot.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43519441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-22DOI: 10.4236/jdaip.2019.71002
A. Williams
Penalized ordinal outcome models were developed to model high dimensional data with ordinal outcomes. One option is the penalized stereotype logit, which includes nonlinear combinations of parameter estimates. Optimization algorithms assuming linearity and function convexity were applied to fit this model. In this study the application of the adaptive moment estimation (Adam) optimizer, suited for nonlinear optimization, to the elastic net penalized stereotype logit model is proposed. The proposed model is compared to the L1 penalized ordinalgmifs stereotype model. Both methods were applied to simulated and real data, with non-Hodgkin lymphoma (NHL) cancer subtypes as the outcome, with results presented and discussed.
{"title":"Ordinal Outcome Modeling: The Application of the Adaptive Moment Estimation Optimizer to the Elastic Net Penalized Stereotype Logit","authors":"A. Williams","doi":"10.4236/jdaip.2019.71002","DOIUrl":"https://doi.org/10.4236/jdaip.2019.71002","url":null,"abstract":"Penalized ordinal outcome models were developed to model high dimensional data with ordinal outcomes. One option is the penalized stereotype logit, which includes nonlinear combinations of parameter estimates. Optimization algorithms assuming linearity and function convexity were applied to fit this model. In this study the application of the adaptive moment estimation (Adam) optimizer, suited for nonlinear optimization, to the elastic net penalized stereotype logit model is proposed. The proposed model is compared to the L1 penalized ordinalgmifs stereotype model. Both methods were applied to simulated and real data, with non-Hodgkin lymphoma (NHL) cancer subtypes as the outcome, with results presented and discussed.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45121604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}