{"title":"塞内加尔达喀尔国立高等大众教育与体育学院(INSEPS)论文网际网路存取的日志分析与文字挖掘","authors":"Pierpaolo Rossi, Anastasie Thiaw","doi":"10.1017/s0305862x00020586","DOIUrl":null,"url":null,"abstract":"IntroductionThe dissertation collection of INSEPS (Higher National Institute of Popular Education and Sport, Dakar, Senegal) consists of 152 documents (PDF format) related to academic work submitted between 2005 and 2008 (2005: 21, 2006: 29, 2007: 45 and 2008: 57) as well as all references to available dissertations of INSEPS' library (imported from a CDS/ISIS3 database).These have, since January 2011, been hosted on the BEEP (Electronic libraries in partnership) website4 which uses the Greenstone software5 (Rossi, 2011). Pdf files of the collection were created either by scanning paper or by converting electronic versions (Word files). The collection of electronic documents was achieved as part of the SIST6 (System for Scientific and Technical Information) project funded by the MAEE7 (French Ministry of Foreign and European Affairs).Through this study, an attempt is made to better define the audience of documents of the collection, following their setting on the web. An attempt is also made to measure the volume and change of this audience over time. The variety of users is examined (geographical origin, concentration or dispersion vis-a-vis the entire funds). Finally, the users' various concerns are studied. The investigation is based on analysis of modes and frequencies of internet consultations of documents in PDF format. This approach is based on the use of log files of the BEEP Apache8 server.MethodologyThe access log of an Apache server is used to record all the transactions to access files hosted by the server and consulted by users9. The format of the BEEP access log is a \"combined log format\". It helps to know, in addition to the \"standard log format\" information, the header \"Referer\"10 and the \"User- Agent\" of the request.To analyse INSEPS' dissertation consultations (pdf files of each dissertation), a file of \"effective access\" was created through several filtering steps on the lines of the Apache log file:(1) selection of lines relating to pdf files of INSEPS' collection,(2) selection of access lines with a status of \"200\"11,(3) exclusion of spiders access lines,(4) exclusion of \"HEAD\" method access lines12,(5) exclusion of spam access lines13.The IP address of each line of the \"effective consultation\" file is then identified by country. Ip address resolution is done using a specific php script that includes the \"MaxMind GeoIP Country Database\"14.Results(1) The audience volume: Evolution along time.Analysis of \"effective consultations\" of the documents of the INSEPS collection shows the distribution of the consultations per month (Table 1) and the average smoothed consultations per quarter (Table 2).After the first two months, the number of consultations quickly rose to a \"cruising level\", peaking at 2,515 consultations per month in May 2011 (a monthly average of 16.5 consultations per document). March 2012 recorded the highest number of consultations (2701, with a monthly average of 17.7 per document). Compared to March 2011 the increase was 66.8 per cent.The average of consultations per month smoothed by quarter (Table 2) also shows the rapid rise in the average number of consultations after the first months of collection implementation on the internet. The first quarter of 2012 saw the highest monthly average (smoothed for the first three months): 15.7.A calendar-specific feature appears. Months of school holidays (July- September 2011) saw a decline in consultations with a \"trough\" in August 2011 (the monthly average is 10 consultations per document). In contrast, the typical time of the preparation of mid-term or final examinations (March-May) gives rise to a peak of consultations.This rate suggests that a substantial number of the readership has academic origin, probably students tasked to produce dissertations similar to those presented here. But, with this lack of direct data on individuals who consult documents, it is necessary to remain cautious. …","PeriodicalId":89063,"journal":{"name":"African research & documentation","volume":"1 1","pages":"79-90"},"PeriodicalIF":0.0000,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Log Analysis and Text Mining on Internet Access to Dissertations of the INSEPS (Institut National Superieur de l'Education Populaire et du Sport) Dakar, Senegal\",\"authors\":\"Pierpaolo Rossi, Anastasie Thiaw\",\"doi\":\"10.1017/s0305862x00020586\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"IntroductionThe dissertation collection of INSEPS (Higher National Institute of Popular Education and Sport, Dakar, Senegal) consists of 152 documents (PDF format) related to academic work submitted between 2005 and 2008 (2005: 21, 2006: 29, 2007: 45 and 2008: 57) as well as all references to available dissertations of INSEPS' library (imported from a CDS/ISIS3 database).These have, since January 2011, been hosted on the BEEP (Electronic libraries in partnership) website4 which uses the Greenstone software5 (Rossi, 2011). Pdf files of the collection were created either by scanning paper or by converting electronic versions (Word files). The collection of electronic documents was achieved as part of the SIST6 (System for Scientific and Technical Information) project funded by the MAEE7 (French Ministry of Foreign and European Affairs).Through this study, an attempt is made to better define the audience of documents of the collection, following their setting on the web. An attempt is also made to measure the volume and change of this audience over time. The variety of users is examined (geographical origin, concentration or dispersion vis-a-vis the entire funds). Finally, the users' various concerns are studied. The investigation is based on analysis of modes and frequencies of internet consultations of documents in PDF format. This approach is based on the use of log files of the BEEP Apache8 server.MethodologyThe access log of an Apache server is used to record all the transactions to access files hosted by the server and consulted by users9. The format of the BEEP access log is a \\\"combined log format\\\". It helps to know, in addition to the \\\"standard log format\\\" information, the header \\\"Referer\\\"10 and the \\\"User- Agent\\\" of the request.To analyse INSEPS' dissertation consultations (pdf files of each dissertation), a file of \\\"effective access\\\" was created through several filtering steps on the lines of the Apache log file:(1) selection of lines relating to pdf files of INSEPS' collection,(2) selection of access lines with a status of \\\"200\\\"11,(3) exclusion of spiders access lines,(4) exclusion of \\\"HEAD\\\" method access lines12,(5) exclusion of spam access lines13.The IP address of each line of the \\\"effective consultation\\\" file is then identified by country. Ip address resolution is done using a specific php script that includes the \\\"MaxMind GeoIP Country Database\\\"14.Results(1) The audience volume: Evolution along time.Analysis of \\\"effective consultations\\\" of the documents of the INSEPS collection shows the distribution of the consultations per month (Table 1) and the average smoothed consultations per quarter (Table 2).After the first two months, the number of consultations quickly rose to a \\\"cruising level\\\", peaking at 2,515 consultations per month in May 2011 (a monthly average of 16.5 consultations per document). March 2012 recorded the highest number of consultations (2701, with a monthly average of 17.7 per document). Compared to March 2011 the increase was 66.8 per cent.The average of consultations per month smoothed by quarter (Table 2) also shows the rapid rise in the average number of consultations after the first months of collection implementation on the internet. The first quarter of 2012 saw the highest monthly average (smoothed for the first three months): 15.7.A calendar-specific feature appears. Months of school holidays (July- September 2011) saw a decline in consultations with a \\\"trough\\\" in August 2011 (the monthly average is 10 consultations per document). In contrast, the typical time of the preparation of mid-term or final examinations (March-May) gives rise to a peak of consultations.This rate suggests that a substantial number of the readership has academic origin, probably students tasked to produce dissertations similar to those presented here. But, with this lack of direct data on individuals who consult documents, it is necessary to remain cautious. …\",\"PeriodicalId\":89063,\"journal\":{\"name\":\"African research & documentation\",\"volume\":\"1 1\",\"pages\":\"79-90\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"African research & documentation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1017/s0305862x00020586\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"African research & documentation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/s0305862x00020586","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Log Analysis and Text Mining on Internet Access to Dissertations of the INSEPS (Institut National Superieur de l'Education Populaire et du Sport) Dakar, Senegal
IntroductionThe dissertation collection of INSEPS (Higher National Institute of Popular Education and Sport, Dakar, Senegal) consists of 152 documents (PDF format) related to academic work submitted between 2005 and 2008 (2005: 21, 2006: 29, 2007: 45 and 2008: 57) as well as all references to available dissertations of INSEPS' library (imported from a CDS/ISIS3 database).These have, since January 2011, been hosted on the BEEP (Electronic libraries in partnership) website4 which uses the Greenstone software5 (Rossi, 2011). Pdf files of the collection were created either by scanning paper or by converting electronic versions (Word files). The collection of electronic documents was achieved as part of the SIST6 (System for Scientific and Technical Information) project funded by the MAEE7 (French Ministry of Foreign and European Affairs).Through this study, an attempt is made to better define the audience of documents of the collection, following their setting on the web. An attempt is also made to measure the volume and change of this audience over time. The variety of users is examined (geographical origin, concentration or dispersion vis-a-vis the entire funds). Finally, the users' various concerns are studied. The investigation is based on analysis of modes and frequencies of internet consultations of documents in PDF format. This approach is based on the use of log files of the BEEP Apache8 server.MethodologyThe access log of an Apache server is used to record all the transactions to access files hosted by the server and consulted by users9. The format of the BEEP access log is a "combined log format". It helps to know, in addition to the "standard log format" information, the header "Referer"10 and the "User- Agent" of the request.To analyse INSEPS' dissertation consultations (pdf files of each dissertation), a file of "effective access" was created through several filtering steps on the lines of the Apache log file:(1) selection of lines relating to pdf files of INSEPS' collection,(2) selection of access lines with a status of "200"11,(3) exclusion of spiders access lines,(4) exclusion of "HEAD" method access lines12,(5) exclusion of spam access lines13.The IP address of each line of the "effective consultation" file is then identified by country. Ip address resolution is done using a specific php script that includes the "MaxMind GeoIP Country Database"14.Results(1) The audience volume: Evolution along time.Analysis of "effective consultations" of the documents of the INSEPS collection shows the distribution of the consultations per month (Table 1) and the average smoothed consultations per quarter (Table 2).After the first two months, the number of consultations quickly rose to a "cruising level", peaking at 2,515 consultations per month in May 2011 (a monthly average of 16.5 consultations per document). March 2012 recorded the highest number of consultations (2701, with a monthly average of 17.7 per document). Compared to March 2011 the increase was 66.8 per cent.The average of consultations per month smoothed by quarter (Table 2) also shows the rapid rise in the average number of consultations after the first months of collection implementation on the internet. The first quarter of 2012 saw the highest monthly average (smoothed for the first three months): 15.7.A calendar-specific feature appears. Months of school holidays (July- September 2011) saw a decline in consultations with a "trough" in August 2011 (the monthly average is 10 consultations per document). In contrast, the typical time of the preparation of mid-term or final examinations (March-May) gives rise to a peak of consultations.This rate suggests that a substantial number of the readership has academic origin, probably students tasked to produce dissertations similar to those presented here. But, with this lack of direct data on individuals who consult documents, it is necessary to remain cautious. …