{"title":"优化一种寻找句子结尾的算法:应用语言学方法论","authors":"D. W. Coleman","doi":"10.1145/503506.503533","DOIUrl":null,"url":null,"abstract":"Most computer text editors are oriented around a line of raw text as entered. When moving text, a more natural unit is the sentence. APLATS is a locally-written text editor which uses the sentence as the basic unit of text. [i] APLATS ends a sentence with (condition (a)) a \"?\" or \".\" followed by a blank, a format delimiter (single character), or a carriage return (as typed). A format delimiter which forces the start of a new line of text (condition (b)) also forces a sentence end. The definition of a sentence end used by APLATS sometimes produces \"excess\" sentence divisions. At other times it fails to produce sentence divisions where they are normally expected. Linguistic methodology was applied in an attempt to determine if the algorithm for finding sentence ends could be improved. Now, condition (b) will only rarely produce an \"incorrect\" sentence boundary. Also, consider the type of case where it will: a sentence breaks in the middle at the end of a line; a long quotation set off from the main body of the text, for example, separates it from the rest of the sentence (or perhaps the quotation ends the sentence). Handling of the long quotation as composed of one or more independent \"sentences\" is probably preferred for editing, anyway. Thus, condition (b) presents no major problems. It is condition (a) which produces many \"unexpected\"--and perhaps inconvenient--sentence breaks. In (1)-(4), for example, APLATS forces \"excess\" sentence ends at the points indicated by a \"#\". Further, it fails to produce sentence breaks in (5)-(6) at the points indicated by a \"+\". Sentence breaks would normally be expected at these locations. (Each of the sentences (1)-(13) is assumed to be extracted from a larger, unspecified context.)","PeriodicalId":258426,"journal":{"name":"ACM-SE 17","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1979-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimizing an algorithm for finding sentence ends: applying linguistic methodology\",\"authors\":\"D. W. Coleman\",\"doi\":\"10.1145/503506.503533\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most computer text editors are oriented around a line of raw text as entered. When moving text, a more natural unit is the sentence. APLATS is a locally-written text editor which uses the sentence as the basic unit of text. [i] APLATS ends a sentence with (condition (a)) a \\\"?\\\" or \\\".\\\" followed by a blank, a format delimiter (single character), or a carriage return (as typed). A format delimiter which forces the start of a new line of text (condition (b)) also forces a sentence end. The definition of a sentence end used by APLATS sometimes produces \\\"excess\\\" sentence divisions. At other times it fails to produce sentence divisions where they are normally expected. Linguistic methodology was applied in an attempt to determine if the algorithm for finding sentence ends could be improved. Now, condition (b) will only rarely produce an \\\"incorrect\\\" sentence boundary. Also, consider the type of case where it will: a sentence breaks in the middle at the end of a line; a long quotation set off from the main body of the text, for example, separates it from the rest of the sentence (or perhaps the quotation ends the sentence). Handling of the long quotation as composed of one or more independent \\\"sentences\\\" is probably preferred for editing, anyway. Thus, condition (b) presents no major problems. It is condition (a) which produces many \\\"unexpected\\\"--and perhaps inconvenient--sentence breaks. In (1)-(4), for example, APLATS forces \\\"excess\\\" sentence ends at the points indicated by a \\\"#\\\". Further, it fails to produce sentence breaks in (5)-(6) at the points indicated by a \\\"+\\\". Sentence breaks would normally be expected at these locations. (Each of the sentences (1)-(13) is assumed to be extracted from a larger, unspecified context.)\",\"PeriodicalId\":258426,\"journal\":{\"name\":\"ACM-SE 17\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1979-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM-SE 17\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/503506.503533\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM-SE 17","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/503506.503533","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Optimizing an algorithm for finding sentence ends: applying linguistic methodology
Most computer text editors are oriented around a line of raw text as entered. When moving text, a more natural unit is the sentence. APLATS is a locally-written text editor which uses the sentence as the basic unit of text. [i] APLATS ends a sentence with (condition (a)) a "?" or "." followed by a blank, a format delimiter (single character), or a carriage return (as typed). A format delimiter which forces the start of a new line of text (condition (b)) also forces a sentence end. The definition of a sentence end used by APLATS sometimes produces "excess" sentence divisions. At other times it fails to produce sentence divisions where they are normally expected. Linguistic methodology was applied in an attempt to determine if the algorithm for finding sentence ends could be improved. Now, condition (b) will only rarely produce an "incorrect" sentence boundary. Also, consider the type of case where it will: a sentence breaks in the middle at the end of a line; a long quotation set off from the main body of the text, for example, separates it from the rest of the sentence (or perhaps the quotation ends the sentence). Handling of the long quotation as composed of one or more independent "sentences" is probably preferred for editing, anyway. Thus, condition (b) presents no major problems. It is condition (a) which produces many "unexpected"--and perhaps inconvenient--sentence breaks. In (1)-(4), for example, APLATS forces "excess" sentence ends at the points indicated by a "#". Further, it fails to produce sentence breaks in (5)-(6) at the points indicated by a "+". Sentence breaks would normally be expected at these locations. (Each of the sentences (1)-(13) is assumed to be extracted from a larger, unspecified context.)