Shail Shah, Shashank Shyam Shankar, N. Rachana, S. Preetha
{"title":"用于WEB抓取的DSL实现","authors":"Shail Shah, Shashank Shyam Shankar, N. Rachana, S. Preetha","doi":"10.23883/IJRTER.2020.6028.LBIFZ","DOIUrl":null,"url":null,"abstract":"The main goal of this project is to implement a DSL for Web Scraping. A Domain Specific Language or DSL in short is a language that is created for solving a single purpose. It is a language that is used in only one domain. In our project, that domain is web scraping. Our main aim is to create a simple scripting language with easy to use syntax with many features that help the user scrape the web easily. Currently, web scraping is a tedious process. At the moment, the majority of web scraping is done by the means of modules in high level languages. This would require the user and in-depth knowledge of the high-level language as well, and thus precludes many laymen from easy web scraping. This project will provide a DSL with highly simplified syntax which does not assume any skill from the user. Thus, anyone would be able to use this language to scrape the web with no previous knowledge of the domain. This DSL has been implemented using Python and its scraping libraries. With this, many features and functionalities can be implemented in the DSL thus providing an effective tool for web scraping without compromising on simplicity Keywords— Domain Specific Language, Web Scraping, Python, Beautiful Soup 4","PeriodicalId":143099,"journal":{"name":"INTERNATIONAL JOURNAL OF RECENT TRENDS IN ENGINEERING & RESEARCH","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"IMPLEMENTATION of DSL for WEB SCRAPING\",\"authors\":\"Shail Shah, Shashank Shyam Shankar, N. Rachana, S. Preetha\",\"doi\":\"10.23883/IJRTER.2020.6028.LBIFZ\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The main goal of this project is to implement a DSL for Web Scraping. A Domain Specific Language or DSL in short is a language that is created for solving a single purpose. It is a language that is used in only one domain. In our project, that domain is web scraping. Our main aim is to create a simple scripting language with easy to use syntax with many features that help the user scrape the web easily. Currently, web scraping is a tedious process. At the moment, the majority of web scraping is done by the means of modules in high level languages. This would require the user and in-depth knowledge of the high-level language as well, and thus precludes many laymen from easy web scraping. This project will provide a DSL with highly simplified syntax which does not assume any skill from the user. Thus, anyone would be able to use this language to scrape the web with no previous knowledge of the domain. This DSL has been implemented using Python and its scraping libraries. With this, many features and functionalities can be implemented in the DSL thus providing an effective tool for web scraping without compromising on simplicity Keywords— Domain Specific Language, Web Scraping, Python, Beautiful Soup 4\",\"PeriodicalId\":143099,\"journal\":{\"name\":\"INTERNATIONAL JOURNAL OF RECENT TRENDS IN ENGINEERING & RESEARCH\",\"volume\":\"83 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"INTERNATIONAL JOURNAL OF RECENT TRENDS IN ENGINEERING & RESEARCH\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23883/IJRTER.2020.6028.LBIFZ\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"INTERNATIONAL JOURNAL OF RECENT TRENDS IN ENGINEERING & RESEARCH","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23883/IJRTER.2020.6028.LBIFZ","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
这个项目的主要目标是为Web抓取实现一个DSL。领域特定语言(Domain Specific Language,简称DSL)是为解决单一目的而创建的语言。它是一种仅在一个领域中使用的语言。在我们的项目中,这个领域就是网页抓取。我们的主要目标是创建一种简单的脚本语言,具有易于使用的语法和许多功能,可以帮助用户轻松地抓取web。目前,网页抓取是一个繁琐的过程。目前,大多数网络抓取都是通过高级语言的模块来完成的。这将需要用户以及对高级语言的深入了解,从而使许多外行无法轻松地抓取网络。这个项目将提供一个语法高度简化的DSL,不需要用户具备任何技能。因此,任何人都可以使用这种语言来抓取网络,而不必事先了解该领域。这个DSL是使用Python及其抓取库实现的。有了这个,许多特性和功能可以在DSL中实现,从而提供了一个有效的工具来进行网页抓取,而不影响简单。关键词:领域特定语言,网页抓取,Python, Beautiful Soup 4
The main goal of this project is to implement a DSL for Web Scraping. A Domain Specific Language or DSL in short is a language that is created for solving a single purpose. It is a language that is used in only one domain. In our project, that domain is web scraping. Our main aim is to create a simple scripting language with easy to use syntax with many features that help the user scrape the web easily. Currently, web scraping is a tedious process. At the moment, the majority of web scraping is done by the means of modules in high level languages. This would require the user and in-depth knowledge of the high-level language as well, and thus precludes many laymen from easy web scraping. This project will provide a DSL with highly simplified syntax which does not assume any skill from the user. Thus, anyone would be able to use this language to scrape the web with no previous knowledge of the domain. This DSL has been implemented using Python and its scraping libraries. With this, many features and functionalities can be implemented in the DSL thus providing an effective tool for web scraping without compromising on simplicity Keywords— Domain Specific Language, Web Scraping, Python, Beautiful Soup 4