塞内加尔达喀尔国立高等大众教育与体育学院(INSEPS)论文网际网路存取的日志分析与文字挖掘

Pierpaolo Rossi, Anastasie Thiaw
{"title":"塞内加尔达喀尔国立高等大众教育与体育学院(INSEPS)论文网际网路存取的日志分析与文字挖掘","authors":"Pierpaolo Rossi, Anastasie Thiaw","doi":"10.1017/s0305862x00020586","DOIUrl":null,"url":null,"abstract":"IntroductionThe dissertation collection of INSEPS (Higher National Institute of Popular Education and Sport, Dakar, Senegal) consists of 152 documents (PDF format) related to academic work submitted between 2005 and 2008 (2005: 21, 2006: 29, 2007: 45 and 2008: 57) as well as all references to available dissertations of INSEPS' library (imported from a CDS/ISIS3 database).These have, since January 2011, been hosted on the BEEP (Electronic libraries in partnership) website4 which uses the Greenstone software5 (Rossi, 2011). Pdf files of the collection were created either by scanning paper or by converting electronic versions (Word files). The collection of electronic documents was achieved as part of the SIST6 (System for Scientific and Technical Information) project funded by the MAEE7 (French Ministry of Foreign and European Affairs).Through this study, an attempt is made to better define the audience of documents of the collection, following their setting on the web. An attempt is also made to measure the volume and change of this audience over time. The variety of users is examined (geographical origin, concentration or dispersion vis-a-vis the entire funds). Finally, the users' various concerns are studied. The investigation is based on analysis of modes and frequencies of internet consultations of documents in PDF format. This approach is based on the use of log files of the BEEP Apache8 server.MethodologyThe access log of an Apache server is used to record all the transactions to access files hosted by the server and consulted by users9. The format of the BEEP access log is a \"combined log format\". It helps to know, in addition to the \"standard log format\" information, the header \"Referer\"10 and the \"User- Agent\" of the request.To analyse INSEPS' dissertation consultations (pdf files of each dissertation), a file of \"effective access\" was created through several filtering steps on the lines of the Apache log file:(1) selection of lines relating to pdf files of INSEPS' collection,(2) selection of access lines with a status of \"200\"11,(3) exclusion of spiders access lines,(4) exclusion of \"HEAD\" method access lines12,(5) exclusion of spam access lines13.The IP address of each line of the \"effective consultation\" file is then identified by country. Ip address resolution is done using a specific php script that includes the \"MaxMind GeoIP Country Database\"14.Results(1) The audience volume: Evolution along time.Analysis of \"effective consultations\" of the documents of the INSEPS collection shows the distribution of the consultations per month (Table 1) and the average smoothed consultations per quarter (Table 2).After the first two months, the number of consultations quickly rose to a \"cruising level\", peaking at 2,515 consultations per month in May 2011 (a monthly average of 16.5 consultations per document). March 2012 recorded the highest number of consultations (2701, with a monthly average of 17.7 per document). Compared to March 2011 the increase was 66.8 per cent.The average of consultations per month smoothed by quarter (Table 2) also shows the rapid rise in the average number of consultations after the first months of collection implementation on the internet. The first quarter of 2012 saw the highest monthly average (smoothed for the first three months): 15.7.A calendar-specific feature appears. Months of school holidays (July- September 2011) saw a decline in consultations with a \"trough\" in August 2011 (the monthly average is 10 consultations per document). In contrast, the typical time of the preparation of mid-term or final examinations (March-May) gives rise to a peak of consultations.This rate suggests that a substantial number of the readership has academic origin, probably students tasked to produce dissertations similar to those presented here. But, with this lack of direct data on individuals who consult documents, it is necessary to remain cautious. …","PeriodicalId":89063,"journal":{"name":"African research & documentation","volume":"1 1","pages":"79-90"},"PeriodicalIF":0.0000,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Log Analysis and Text Mining on Internet Access to Dissertations of the INSEPS (Institut National Superieur de l'Education Populaire et du Sport) Dakar, Senegal\",\"authors\":\"Pierpaolo Rossi, Anastasie Thiaw\",\"doi\":\"10.1017/s0305862x00020586\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"IntroductionThe dissertation collection of INSEPS (Higher National Institute of Popular Education and Sport, Dakar, Senegal) consists of 152 documents (PDF format) related to academic work submitted between 2005 and 2008 (2005: 21, 2006: 29, 2007: 45 and 2008: 57) as well as all references to available dissertations of INSEPS' library (imported from a CDS/ISIS3 database).These have, since January 2011, been hosted on the BEEP (Electronic libraries in partnership) website4 which uses the Greenstone software5 (Rossi, 2011). Pdf files of the collection were created either by scanning paper or by converting electronic versions (Word files). The collection of electronic documents was achieved as part of the SIST6 (System for Scientific and Technical Information) project funded by the MAEE7 (French Ministry of Foreign and European Affairs).Through this study, an attempt is made to better define the audience of documents of the collection, following their setting on the web. An attempt is also made to measure the volume and change of this audience over time. The variety of users is examined (geographical origin, concentration or dispersion vis-a-vis the entire funds). Finally, the users' various concerns are studied. The investigation is based on analysis of modes and frequencies of internet consultations of documents in PDF format. This approach is based on the use of log files of the BEEP Apache8 server.MethodologyThe access log of an Apache server is used to record all the transactions to access files hosted by the server and consulted by users9. The format of the BEEP access log is a \\\"combined log format\\\". It helps to know, in addition to the \\\"standard log format\\\" information, the header \\\"Referer\\\"10 and the \\\"User- Agent\\\" of the request.To analyse INSEPS' dissertation consultations (pdf files of each dissertation), a file of \\\"effective access\\\" was created through several filtering steps on the lines of the Apache log file:(1) selection of lines relating to pdf files of INSEPS' collection,(2) selection of access lines with a status of \\\"200\\\"11,(3) exclusion of spiders access lines,(4) exclusion of \\\"HEAD\\\" method access lines12,(5) exclusion of spam access lines13.The IP address of each line of the \\\"effective consultation\\\" file is then identified by country. Ip address resolution is done using a specific php script that includes the \\\"MaxMind GeoIP Country Database\\\"14.Results(1) The audience volume: Evolution along time.Analysis of \\\"effective consultations\\\" of the documents of the INSEPS collection shows the distribution of the consultations per month (Table 1) and the average smoothed consultations per quarter (Table 2).After the first two months, the number of consultations quickly rose to a \\\"cruising level\\\", peaking at 2,515 consultations per month in May 2011 (a monthly average of 16.5 consultations per document). March 2012 recorded the highest number of consultations (2701, with a monthly average of 17.7 per document). Compared to March 2011 the increase was 66.8 per cent.The average of consultations per month smoothed by quarter (Table 2) also shows the rapid rise in the average number of consultations after the first months of collection implementation on the internet. The first quarter of 2012 saw the highest monthly average (smoothed for the first three months): 15.7.A calendar-specific feature appears. Months of school holidays (July- September 2011) saw a decline in consultations with a \\\"trough\\\" in August 2011 (the monthly average is 10 consultations per document). In contrast, the typical time of the preparation of mid-term or final examinations (March-May) gives rise to a peak of consultations.This rate suggests that a substantial number of the readership has academic origin, probably students tasked to produce dissertations similar to those presented here. But, with this lack of direct data on individuals who consult documents, it is necessary to remain cautious. …\",\"PeriodicalId\":89063,\"journal\":{\"name\":\"African research & documentation\",\"volume\":\"1 1\",\"pages\":\"79-90\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"African research & documentation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1017/s0305862x00020586\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"African research & documentation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/s0305862x00020586","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

INSEPS(塞内加尔达喀尔高等国家大众教育和体育研究所)的论文集包括152份文件(PDF格式),涉及2005年至2008年间提交的学术工作(2005:21,2006:29,2007:45和2008:57)以及INSEPS图书馆所有可用论文的参考文献(从CDS/ISIS3数据库导入)。自2011年1月起,这些图书馆被托管在BEEP(合作电子图书馆)网站上,该网站使用Greenstone软件(Rossi, 2011)。馆藏的Pdf文件是通过扫描纸张或转换电子版本(Word文件)创建的。电子文件的收集是由MAEE7(法国外交和欧洲事务部)资助的SIST6(科学和技术信息系统)项目的一部分。通过这项研究,试图更好地定义集合文件的受众,随着它们在网络上的设置。我们还尝试测量这些受众的数量和随时间的变化。审查了用户的种类(地理来源、相对于整个基金的集中或分散)。最后,对用户的各种关注点进行了研究。调查是基于对PDF格式文件的互联网咨询模式和频率的分析。这种方法基于使用BEEP Apache8服务器的日志文件。方法Apache服务器的访问日志用于记录所有访问服务器托管文件的事务,并由用户进行查询。BEEP访问日志的格式是“组合日志格式”。除了“标准日志格式”信息之外,它还有助于了解请求的标头“Referer”10和“User- Agent”。为了分析INSEPS的论文咨询(每篇论文的pdf文件),通过Apache日志文件的几个过滤步骤创建了一个“有效访问”文件:(1)选择与INSEPS收集的pdf文件相关的行,(2)选择状态为“200”的访问行11,(3)排除蜘蛛访问行,(4)排除“HEAD”方法访问行12,(5)排除垃圾访问行13。“有效协商”文件的每一行的IP地址然后按国家确定。Ip地址解析是使用一个特定的php脚本完成的,其中包括“MaxMind GeoIP国家数据库”。对INSEPS收集的文件的“有效咨询”的分析显示了每月咨询的分布(表1)和每季度平均平滑咨询(表2)。在头两个月之后,咨询的数量迅速上升到“巡航水平”,2011年5月达到每月2,515次咨询的峰值(每月平均每份文件16.5次咨询)。2012年3月的咨询次数最多(2701次,每月平均每份文件17.7次)。与2011年3月相比,增幅为66.8%。每月的平均诊疗量按季度平滑化(表2)也显示,在互联网上实施收费后的头几个月,平均诊疗量迅速上升。2012年第一季度是最高的月平均值(平滑了前三个月):15.7。出现一个日历特定的功能。学校假期期间(2011年7月至9月),咨询数量下降,并在2011年8月达到“低谷”(每月平均每份文件咨询10次)。相反,准备期中或期末考试的典型时间(3月至5月)是咨询的高峰。这一比率表明,相当一部分读者有学术背景,可能是学生,他们的任务是写出类似于这里所展示的论文。但是,由于缺乏查阅文件的个人的直接数据,有必要保持谨慎。...
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Log Analysis and Text Mining on Internet Access to Dissertations of the INSEPS (Institut National Superieur de l'Education Populaire et du Sport) Dakar, Senegal
IntroductionThe dissertation collection of INSEPS (Higher National Institute of Popular Education and Sport, Dakar, Senegal) consists of 152 documents (PDF format) related to academic work submitted between 2005 and 2008 (2005: 21, 2006: 29, 2007: 45 and 2008: 57) as well as all references to available dissertations of INSEPS' library (imported from a CDS/ISIS3 database).These have, since January 2011, been hosted on the BEEP (Electronic libraries in partnership) website4 which uses the Greenstone software5 (Rossi, 2011). Pdf files of the collection were created either by scanning paper or by converting electronic versions (Word files). The collection of electronic documents was achieved as part of the SIST6 (System for Scientific and Technical Information) project funded by the MAEE7 (French Ministry of Foreign and European Affairs).Through this study, an attempt is made to better define the audience of documents of the collection, following their setting on the web. An attempt is also made to measure the volume and change of this audience over time. The variety of users is examined (geographical origin, concentration or dispersion vis-a-vis the entire funds). Finally, the users' various concerns are studied. The investigation is based on analysis of modes and frequencies of internet consultations of documents in PDF format. This approach is based on the use of log files of the BEEP Apache8 server.MethodologyThe access log of an Apache server is used to record all the transactions to access files hosted by the server and consulted by users9. The format of the BEEP access log is a "combined log format". It helps to know, in addition to the "standard log format" information, the header "Referer"10 and the "User- Agent" of the request.To analyse INSEPS' dissertation consultations (pdf files of each dissertation), a file of "effective access" was created through several filtering steps on the lines of the Apache log file:(1) selection of lines relating to pdf files of INSEPS' collection,(2) selection of access lines with a status of "200"11,(3) exclusion of spiders access lines,(4) exclusion of "HEAD" method access lines12,(5) exclusion of spam access lines13.The IP address of each line of the "effective consultation" file is then identified by country. Ip address resolution is done using a specific php script that includes the "MaxMind GeoIP Country Database"14.Results(1) The audience volume: Evolution along time.Analysis of "effective consultations" of the documents of the INSEPS collection shows the distribution of the consultations per month (Table 1) and the average smoothed consultations per quarter (Table 2).After the first two months, the number of consultations quickly rose to a "cruising level", peaking at 2,515 consultations per month in May 2011 (a monthly average of 16.5 consultations per document). March 2012 recorded the highest number of consultations (2701, with a monthly average of 17.7 per document). Compared to March 2011 the increase was 66.8 per cent.The average of consultations per month smoothed by quarter (Table 2) also shows the rapid rise in the average number of consultations after the first months of collection implementation on the internet. The first quarter of 2012 saw the highest monthly average (smoothed for the first three months): 15.7.A calendar-specific feature appears. Months of school holidays (July- September 2011) saw a decline in consultations with a "trough" in August 2011 (the monthly average is 10 consultations per document). In contrast, the typical time of the preparation of mid-term or final examinations (March-May) gives rise to a peak of consultations.This rate suggests that a substantial number of the readership has academic origin, probably students tasked to produce dissertations similar to those presented here. But, with this lack of direct data on individuals who consult documents, it is necessary to remain cautious. …
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Digital Archives in a Changing Rwanda African Street Literature and the Future of Literary Form Annotated Maps: Charting Research Through Technology Looking for Africa: Sources in London Archives at London Metropolitan Archives (LMA) Tackling Africa: the resourceful Mrs J. Theodore Bent
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1