用于WEB抓取的DSL实现

Shail Shah, Shashank Shyam Shankar, N. Rachana, S. Preetha
{"title":"用于WEB抓取的DSL实现","authors":"Shail Shah, Shashank Shyam Shankar, N. Rachana, S. Preetha","doi":"10.23883/IJRTER.2020.6028.LBIFZ","DOIUrl":null,"url":null,"abstract":"The main goal of this project is to implement a DSL for Web Scraping. A Domain Specific Language or DSL in short is a language that is created for solving a single purpose. It is a language that is used in only one domain. In our project, that domain is web scraping. Our main aim is to create a simple scripting language with easy to use syntax with many features that help the user scrape the web easily. Currently, web scraping is a tedious process. At the moment, the majority of web scraping is done by the means of modules in high level languages. This would require the user and in-depth knowledge of the high-level language as well, and thus precludes many laymen from easy web scraping. This project will provide a DSL with highly simplified syntax which does not assume any skill from the user. Thus, anyone would be able to use this language to scrape the web with no previous knowledge of the domain. This DSL has been implemented using Python and its scraping libraries. With this, many features and functionalities can be implemented in the DSL thus providing an effective tool for web scraping without compromising on simplicity Keywords— Domain Specific Language, Web Scraping, Python, Beautiful Soup 4","PeriodicalId":143099,"journal":{"name":"INTERNATIONAL JOURNAL OF RECENT TRENDS IN ENGINEERING & RESEARCH","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"IMPLEMENTATION of DSL for WEB SCRAPING\",\"authors\":\"Shail Shah, Shashank Shyam Shankar, N. Rachana, S. Preetha\",\"doi\":\"10.23883/IJRTER.2020.6028.LBIFZ\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The main goal of this project is to implement a DSL for Web Scraping. A Domain Specific Language or DSL in short is a language that is created for solving a single purpose. It is a language that is used in only one domain. In our project, that domain is web scraping. Our main aim is to create a simple scripting language with easy to use syntax with many features that help the user scrape the web easily. Currently, web scraping is a tedious process. At the moment, the majority of web scraping is done by the means of modules in high level languages. This would require the user and in-depth knowledge of the high-level language as well, and thus precludes many laymen from easy web scraping. This project will provide a DSL with highly simplified syntax which does not assume any skill from the user. Thus, anyone would be able to use this language to scrape the web with no previous knowledge of the domain. This DSL has been implemented using Python and its scraping libraries. With this, many features and functionalities can be implemented in the DSL thus providing an effective tool for web scraping without compromising on simplicity Keywords— Domain Specific Language, Web Scraping, Python, Beautiful Soup 4\",\"PeriodicalId\":143099,\"journal\":{\"name\":\"INTERNATIONAL JOURNAL OF RECENT TRENDS IN ENGINEERING & RESEARCH\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"INTERNATIONAL JOURNAL OF RECENT TRENDS IN ENGINEERING & RESEARCH\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23883/IJRTER.2020.6028.LBIFZ\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"INTERNATIONAL JOURNAL OF RECENT TRENDS IN ENGINEERING & RESEARCH","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23883/IJRTER.2020.6028.LBIFZ","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

这个项目的主要目标是为Web抓取实现一个DSL。领域特定语言(Domain Specific Language,简称DSL)是为解决单一目的而创建的语言。它是一种仅在一个领域中使用的语言。在我们的项目中,这个领域就是网页抓取。我们的主要目标是创建一种简单的脚本语言,具有易于使用的语法和许多功能,可以帮助用户轻松地抓取web。目前,网页抓取是一个繁琐的过程。目前,大多数网络抓取都是通过高级语言的模块来完成的。这将需要用户以及对高级语言的深入了解,从而使许多外行无法轻松地抓取网络。这个项目将提供一个语法高度简化的DSL,不需要用户具备任何技能。因此,任何人都可以使用这种语言来抓取网络,而不必事先了解该领域。这个DSL是使用Python及其抓取库实现的。有了这个,许多特性和功能可以在DSL中实现,从而提供了一个有效的工具来进行网页抓取,而不影响简单。关键词:领域特定语言,网页抓取,Python, Beautiful Soup 4
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
IMPLEMENTATION of DSL for WEB SCRAPING
The main goal of this project is to implement a DSL for Web Scraping. A Domain Specific Language or DSL in short is a language that is created for solving a single purpose. It is a language that is used in only one domain. In our project, that domain is web scraping. Our main aim is to create a simple scripting language with easy to use syntax with many features that help the user scrape the web easily. Currently, web scraping is a tedious process. At the moment, the majority of web scraping is done by the means of modules in high level languages. This would require the user and in-depth knowledge of the high-level language as well, and thus precludes many laymen from easy web scraping. This project will provide a DSL with highly simplified syntax which does not assume any skill from the user. Thus, anyone would be able to use this language to scrape the web with no previous knowledge of the domain. This DSL has been implemented using Python and its scraping libraries. With this, many features and functionalities can be implemented in the DSL thus providing an effective tool for web scraping without compromising on simplicity Keywords— Domain Specific Language, Web Scraping, Python, Beautiful Soup 4
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Transducers Effect of powder mixed dielectric on Surface properties in Electric Discharge Machining PERFORMANCE EVALUATION OF SINGLE PHASE LPG BASED GENERATOR SET USED FOR ELECTRICITY GENERATION REQUIRED FOR DOMESTIC APPLICATION IMPLEMENTATION of DSL for WEB SCRAPING RECOMMENDATION OF SENSOR BASED SMART DUSTBINS USING IOT
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1