With the growth of the content found throughout the Web, every information can be plagiarized. Plagiarism is the process of using the ideas of another without naming the source. Consequently, plagiarism detection is necessary but complicated as it is often facing significant challenges given the large amount of material on the World-wide-web and the limited access to a substantial part of them. In this paper, we present a novel plagiarism detection method for French documents. The proposed method combines the intrinsic and extrinsic aspects for plagiarism detection. We achieved good results with both approaches. For the extrinsic method, we achieved an accuracy of 62% for the first tests of the method. As for the intrinsic, we achieved an F-score of 0.328.
{"title":"Hybrid plagiarism detection method for French language","authors":"Maryam Elamine, Seifeddine Mechti, Lamia Hadrich Belguith","doi":"10.3233/his-200284","DOIUrl":"https://doi.org/10.3233/his-200284","url":null,"abstract":"With the growth of the content found throughout the Web, every information can be plagiarized. Plagiarism is the process of using the ideas of another without naming the source. Consequently, plagiarism detection is necessary but complicated as it is often facing significant challenges given the large amount of material on the World-wide-web and the limited access to a substantial part of them. In this paper, we present a novel plagiarism detection method for French documents. The proposed method combines the intrinsic and extrinsic aspects for plagiarism detection. We achieved good results with both approaches. For the extrinsic method, we achieved an accuracy of 62% for the first tests of the method. As for the intrinsic, we achieved an F-score of 0.328.","PeriodicalId":88526,"journal":{"name":"International journal of hybrid intelligent systems","volume":"5 1","pages":"163-175"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90382462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Critical Instances Removal based Under-Sampling (CIRUS): A solution for class imbalance problem","authors":"G. Rekha, V. Reddy, A. Tyagi","doi":"10.3233/his-200279","DOIUrl":"https://doi.org/10.3233/his-200279","url":null,"abstract":"","PeriodicalId":88526,"journal":{"name":"International journal of hybrid intelligent systems","volume":"116 1","pages":"55-66"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79132113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
. The purpose of this research is to develop and adapt a complex of hybrid mathematical and instrumental methods of analysis and risk management through the prediction of natural time series with memory. The paper poses the problem of developing a constructive method for predictive analysis of time series within the current trend of using so-called “graphical tests” in the process of time series modeling using nonlinear dynamics methods. The main purpose of using graphical tests is to identify both stable and unstable quasiperiodic cycles (quasi-cycles). Modern computer technologies which allow to study in detail complex phenomena and processes were used as a toolkit for the implementation of nonlinear dynamics methods. Authors propose to use for the predictive analysis of time series a modified R/S -analysis algorithm, as well as phase analysis methods for constructing phase portraits in order to identify cycles of the studied time series and confirm the forecast. This approach differs from classical forecasting methods by implementing trends accounting and appears to the authors as a new tool for identifying the cyclical components of the considered time series. Using the proposed hybrid complex, the decision maker has more detailed information that cannot be obtained using classical statistics methods. In this paper, authors analyzed the time series of Kuban mountain river runoffs, revealed the impossibility of using the classical Hurst method for their predictive analysis and also proved the consistency of using the proposed hybrid toolkit to identify the cyclic components of the time series and predict it. The study acquires particular relevance in the light of the absence of any effective methods for predicting natural-economic time series, despite the proven need to study them and their risk-extreme levels. The work was supported by Russian Foundation for Basic Research (Grant No 17-06-00354 A).
{"title":"Methods of nonlinear dynamics as a hybrid tool for predictive analysis and research of risk-extreme levels","authors":"E. Popova, L. Costa, A. Kumratova, D. Zamotajlova","doi":"10.3233/HIS-190272","DOIUrl":"https://doi.org/10.3233/HIS-190272","url":null,"abstract":". The purpose of this research is to develop and adapt a complex of hybrid mathematical and instrumental methods of analysis and risk management through the prediction of natural time series with memory. The paper poses the problem of developing a constructive method for predictive analysis of time series within the current trend of using so-called “graphical tests” in the process of time series modeling using nonlinear dynamics methods. The main purpose of using graphical tests is to identify both stable and unstable quasiperiodic cycles (quasi-cycles). Modern computer technologies which allow to study in detail complex phenomena and processes were used as a toolkit for the implementation of nonlinear dynamics methods. Authors propose to use for the predictive analysis of time series a modified R/S -analysis algorithm, as well as phase analysis methods for constructing phase portraits in order to identify cycles of the studied time series and confirm the forecast. This approach differs from classical forecasting methods by implementing trends accounting and appears to the authors as a new tool for identifying the cyclical components of the considered time series. Using the proposed hybrid complex, the decision maker has more detailed information that cannot be obtained using classical statistics methods. In this paper, authors analyzed the time series of Kuban mountain river runoffs, revealed the impossibility of using the classical Hurst method for their predictive analysis and also proved the consistency of using the proposed hybrid toolkit to identify the cyclic components of the time series and predict it. The study acquires particular relevance in the light of the absence of any effective methods for predicting natural-economic time series, despite the proven need to study them and their risk-extreme levels. The work was supported by Russian Foundation for Basic Research (Grant No 17-06-00354 A).","PeriodicalId":88526,"journal":{"name":"International journal of hybrid intelligent systems","volume":"8 1","pages":"221-241"},"PeriodicalIF":0.0,"publicationDate":"2019-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89487305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Classification tasks are being tackled in a plethora of scientific fields, such as astronomy, finance, healthcare, human mobility, and pharmacology, to name a few. Classification is defined as a supervised learning approach that uses labeled data to assign instances to classes. A common approach to tackle these tasks are ensemble methods. These are methods that employ a set of models, instead of just one and combine the predictions of every model to obtain the prediction of the whole. Common obstacles in ensemble learning are the choice of base models to use and how best to aggregate the predictions of each individual to produce the ensemble’s prediction. It is also expected to mitigate the weaknesses of its members while pooling their strengths together. It is in this context that Evolutionary Directed Graph Ensembles (EDGE) thrives. EDGE is a machine learning tool based on social dynamics and modeling of trust in human beings using graph theory. Evolutionary Algorithms are used to evolve ensembles of models that are arranged in a directed acyclic graph structure. The connections in the graph map the trust of each node in its predecessors. The novelty in such an approach stems from the fusion of ensemble learning with graphs and evolutionary algorithms. A limitation of EDGE is that it focuses only on changing the topology of the graph ensembles, with the authors of hypothesizing about using the learned graphs for other tasks. with gains as substantial as 30 percentage points. The bootstrap was shown to be effective in improving the prediction power, with the exploitation of previous runs improved the results on 19 out of 21 datasets. The contributions can be summarized as a novel way to evolve graph ensembles, by also evolving the weights between nodes of the graphs, coupled with the idea of bootstrapping any dataset using previous runs from other datasets. The analysis of dataset choice for the bootstrapping lead to the proposal of a similarity metric between datasets that can be used to facilitate the choice for bootstrapping, without exhaustive or random search in the available datasets. uma métrica de semelhança que pode ser utilizada em vez de uma pesquisa exaustiva nos conjuntos de dados disponíveis.
{"title":"EDGE: Evolutionary Directed Graph Ensembles","authors":"Xavier Fontes, D. Silva","doi":"10.3233/HIS-190273","DOIUrl":"https://doi.org/10.3233/HIS-190273","url":null,"abstract":"Classification tasks are being tackled in a plethora of scientific fields, such as astronomy, finance, healthcare, human mobility, and pharmacology, to name a few. Classification is defined as a supervised learning approach that uses labeled data to assign instances to classes. A common approach to tackle these tasks are ensemble methods. These are methods that employ a set of models, instead of just one and combine the predictions of every model to obtain the prediction of the whole. Common obstacles in ensemble learning are the choice of base models to use and how best to aggregate the predictions of each individual to produce the ensemble’s prediction. It is also expected to mitigate the weaknesses of its members while pooling their strengths together. It is in this context that Evolutionary Directed Graph Ensembles (EDGE) thrives. EDGE is a machine learning tool based on social dynamics and modeling of trust in human beings using graph theory. Evolutionary Algorithms are used to evolve ensembles of models that are arranged in a directed acyclic graph structure. The connections in the graph map the trust of each node in its predecessors. The novelty in such an approach stems from the fusion of ensemble learning with graphs and evolutionary algorithms. A limitation of EDGE is that it focuses only on changing the topology of the graph ensembles, with the authors of hypothesizing about using the learned graphs for other tasks. with gains as substantial as 30 percentage points. The bootstrap was shown to be effective in improving the prediction power, with the exploitation of previous runs improved the results on 19 out of 21 datasets. The contributions can be summarized as a novel way to evolve graph ensembles, by also evolving the weights between nodes of the graphs, coupled with the idea of bootstrapping any dataset using previous runs from other datasets. The analysis of dataset choice for the bootstrapping lead to the proposal of a similarity metric between datasets that can be used to facilitate the choice for bootstrapping, without exhaustive or random search in the available datasets. uma métrica de semelhança que pode ser utilizada em vez de uma pesquisa exaustiva nos conjuntos de dados disponíveis.","PeriodicalId":88526,"journal":{"name":"International journal of hybrid intelligent systems","volume":"57 1","pages":"243-256"},"PeriodicalIF":0.0,"publicationDate":"2019-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75062825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Forecasting banking sectors in Indian stock markets using machine intelligence","authors":"Arjun R, K. R. Suprabha","doi":"10.3233/HIS-190266","DOIUrl":"https://doi.org/10.3233/HIS-190266","url":null,"abstract":"","PeriodicalId":88526,"journal":{"name":"International journal of hybrid intelligent systems","volume":"292 4","pages":"129-142"},"PeriodicalIF":0.0,"publicationDate":"2019-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/HIS-190266","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72455232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coarse grained parallel quantum genetic algorithm for reconfiguration and service restoration of electric power networks","authors":"Ahmed Adel Hieba, N. Abbasy, A. Abdelaziz","doi":"10.3233/HIS-190268","DOIUrl":"https://doi.org/10.3233/HIS-190268","url":null,"abstract":"","PeriodicalId":88526,"journal":{"name":"International journal of hybrid intelligent systems","volume":"48 1","pages":"155-171"},"PeriodicalIF":0.0,"publicationDate":"2019-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80981422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance improvement of a genetic algorithm using a novel restart strategy with elitism principle","authors":"A. Das, D. K. Pratihar","doi":"10.3233/HIS-180257","DOIUrl":"https://doi.org/10.3233/HIS-180257","url":null,"abstract":"","PeriodicalId":88526,"journal":{"name":"International journal of hybrid intelligent systems","volume":"1967 1","pages":"1-15"},"PeriodicalIF":0.0,"publicationDate":"2019-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91396327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Statistical uncertainties are rarely incorporated into machine learning algorithms, especially for anomaly detection. Here we present the Bayesian Anomaly Detection And Classification (BADAC) formalism, which provides a unified statistical approach to classification and anomaly detection within a hierarchical Bayesian framework. BADAC deals with uncertainties by marginalising over the unknown, true, value of the data. Using simulated data with Gaussian noise as an example, BADAC is shown to be superior to standard algorithms in both classification and anomaly detection performance in the presence of uncertainties. Additionally, BADAC provides well-calibrated classification probabilities, valuable for use in scientific pipelines. We show that BADAC can work in online mode and is fairly robust to model errors, which can be diagnosed through model-selection methods. In addition it can perform unsupervised new class detection and can naturally be extended to search for anomalous subsets of data. BADAC is therefore ideal where computational cost is not a limiting factor and statistical rigour is important. We discuss approximations to speed up BADAC, such as the use of Gaussian processes, and finally introduce a new metric, the Rank-Weighted Score (RWS), that is particularly suited to evaluating an algorithm’s ability to detect anomalies.
{"title":"Bayesian Anomaly Detection and Classification","authors":"E. Roberts, B. Bassett, M. Lochner","doi":"10.3233/his-200282","DOIUrl":"https://doi.org/10.3233/his-200282","url":null,"abstract":"Statistical uncertainties are rarely incorporated into machine learning algorithms, especially for anomaly detection. Here we present the Bayesian Anomaly Detection And Classification (BADAC) formalism, which provides a unified statistical approach to classification and anomaly detection within a hierarchical Bayesian framework. BADAC deals with uncertainties by marginalising over the unknown, true, value of the data. Using simulated data with Gaussian noise as an example, BADAC is shown to be superior to standard algorithms in both classification and anomaly detection performance in the presence of uncertainties. Additionally, BADAC provides well-calibrated classification probabilities, valuable for use in scientific pipelines. We show that BADAC can work in online mode and is fairly robust to model errors, which can be diagnosed through model-selection methods. In addition it can perform unsupervised new class detection and can naturally be extended to search for anomalous subsets of data. BADAC is therefore ideal where computational cost is not a limiting factor and statistical rigour is important. We discuss approximations to speed up BADAC, such as the use of Gaussian processes, and finally introduce a new metric, the Rank-Weighted Score (RWS), that is particularly suited to evaluating an algorithm’s ability to detect anomalies.","PeriodicalId":88526,"journal":{"name":"International journal of hybrid intelligent systems","volume":"1 1","pages":"426-435"},"PeriodicalIF":0.0,"publicationDate":"2019-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/his-200282","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49644196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Solving the travelling salesman problem using fuzzy and simplified variants of ant supervised by PSO with local search policy, FAS-PSO-LS, SAS-PSO-LS","authors":"N. Rokbani, A. Abraham, Ikram Twir, A. Haqiq","doi":"10.3233/HIS-180258","DOIUrl":"https://doi.org/10.3233/HIS-180258","url":null,"abstract":"","PeriodicalId":88526,"journal":{"name":"International journal of hybrid intelligent systems","volume":"14 1","pages":"17-26"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74724978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}