{"title":"Construction of Scholarly n-Gram from Huge Text Data","authors":"Myunggwon Hwang, Ha-neul Yeom, Mi-Nyeong Hwang, Hanmin Jung","doi":"10.1109/IMIS.2014.4","DOIUrl":null,"url":null,"abstract":"The ultimate goal of this research is to provide n-gram data that is specialized for scholarly utilization. To this end, this paper outlines the construction of a scholarly n-gram through the processing of large text documents. Many researchers, especially non-native English language speakers, find it difficult to construct sentences and paragraphs with appropriate and disambiguated words. One of the methods that can assist them is the provision of n-gram data. A representative n-gram known as Web 1T 5-Gram Version 1, which was constructed by processing virtually all documents retrieved using Google, already exists. However, this data contain unfocused word recommendations, therefore, they are not suitable. Consequently, we are constructing a scholarly n-gram. In this paper, we demonstrate the efficiency of n-gram using Web 1T unigram and introduce and discuss the specifics of our research plan related to scholarly n-gram.","PeriodicalId":345694,"journal":{"name":"2014 Eighth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Eighth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMIS.2014.4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The ultimate goal of this research is to provide n-gram data that is specialized for scholarly utilization. To this end, this paper outlines the construction of a scholarly n-gram through the processing of large text documents. Many researchers, especially non-native English language speakers, find it difficult to construct sentences and paragraphs with appropriate and disambiguated words. One of the methods that can assist them is the provision of n-gram data. A representative n-gram known as Web 1T 5-Gram Version 1, which was constructed by processing virtually all documents retrieved using Google, already exists. However, this data contain unfocused word recommendations, therefore, they are not suitable. Consequently, we are constructing a scholarly n-gram. In this paper, we demonstrate the efficiency of n-gram using Web 1T unigram and introduce and discuss the specifics of our research plan related to scholarly n-gram.