{"title":"Implementation of an Algorithm for Automatic Segmentation of Texts based on Stylometric Analysis","authors":"S. C. Bolea","doi":"10.1109/iseee53383.2021.9628868","DOIUrl":null,"url":null,"abstract":"Segmenting the text into larger or smaller parts is useful for natural language processing. Starting from the initial segmentation of the literary text into three segments, (made by the author or a supposed expert reader), we developed an algorithm that segments the document at the page level, (a page being composed by a number of phrases), while keeping the number of initial sections (as they appear in the printed document). In this paper we present a program / software application for stylometric analysis, including the computation of some distances (Euclidean, Cosine) and other measurements (Bray - Curtis Dissimilarity, Correlation), for the obtained segments, in order to find out which is the best segmentation of the text. The software we developed has the purpose of exemplifying the algorithm and it matters little in what language or form it is made.","PeriodicalId":299873,"journal":{"name":"2021 7th International Symposium on Electrical and Electronics Engineering (ISEEE)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Symposium on Electrical and Electronics Engineering (ISEEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iseee53383.2021.9628868","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Segmenting the text into larger or smaller parts is useful for natural language processing. Starting from the initial segmentation of the literary text into three segments, (made by the author or a supposed expert reader), we developed an algorithm that segments the document at the page level, (a page being composed by a number of phrases), while keeping the number of initial sections (as they appear in the printed document). In this paper we present a program / software application for stylometric analysis, including the computation of some distances (Euclidean, Cosine) and other measurements (Bray - Curtis Dissimilarity, Correlation), for the obtained segments, in order to find out which is the best segmentation of the text. The software we developed has the purpose of exemplifying the algorithm and it matters little in what language or form it is made.