{"title":"基于scrapy的抓取及淘宝用户行为特征分析","authors":"Jing Wang, Yuchun Guo","doi":"10.1109/CyberC.2012.17","DOIUrl":null,"url":null,"abstract":"The widespread use of Internet provides a good environment for e-commerce. Study on e-commerce network characteristics always focuses on the Taobao. So far, researches based on Taobao are related to credit rating system, marketing strategy, analysis of characteristics of the seller and so on. The purpose of all these studies is to analyze online marketing transactions in e-commerce. In this paper, we analyze e-commerce network from the perspective of graph theory. Our contributions lie in two aspects as following: (1) crawl Taobao share-platform using Scrapy crawl architecture. After analyzing format of web pages in Taobao deeply, combined with the BFS and MHRW two kinds of sampling methods, we ran crawler on five PCs for 30 days. Besides, we list some big problems encountered in the crawling process, then give the final solution. In addition, we crawled one type of sellers' data in order to analyze relationships between sellers and buyers. (2) Analyze characteristics of users' behavior in Taobao share-platform based on obtained dataset. We intend to find the relationships between sellers and buyers connected by items in share-platform. Surprisingly, we find that share-platform is a tool for some buyers to advertise items for sellers who have high credit score, and other buyers only to help them to support the platform.","PeriodicalId":416468,"journal":{"name":"2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","volume":"148 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":"{\"title\":\"Scrapy-Based Crawling and User-Behavior Characteristics Analysis on Taobao\",\"authors\":\"Jing Wang, Yuchun Guo\",\"doi\":\"10.1109/CyberC.2012.17\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The widespread use of Internet provides a good environment for e-commerce. Study on e-commerce network characteristics always focuses on the Taobao. So far, researches based on Taobao are related to credit rating system, marketing strategy, analysis of characteristics of the seller and so on. The purpose of all these studies is to analyze online marketing transactions in e-commerce. In this paper, we analyze e-commerce network from the perspective of graph theory. Our contributions lie in two aspects as following: (1) crawl Taobao share-platform using Scrapy crawl architecture. After analyzing format of web pages in Taobao deeply, combined with the BFS and MHRW two kinds of sampling methods, we ran crawler on five PCs for 30 days. Besides, we list some big problems encountered in the crawling process, then give the final solution. In addition, we crawled one type of sellers' data in order to analyze relationships between sellers and buyers. (2) Analyze characteristics of users' behavior in Taobao share-platform based on obtained dataset. We intend to find the relationships between sellers and buyers connected by items in share-platform. Surprisingly, we find that share-platform is a tool for some buyers to advertise items for sellers who have high credit score, and other buyers only to help them to support the platform.\",\"PeriodicalId\":416468,\"journal\":{\"name\":\"2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery\",\"volume\":\"148 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"43\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CyberC.2012.17\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CyberC.2012.17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Scrapy-Based Crawling and User-Behavior Characteristics Analysis on Taobao
The widespread use of Internet provides a good environment for e-commerce. Study on e-commerce network characteristics always focuses on the Taobao. So far, researches based on Taobao are related to credit rating system, marketing strategy, analysis of characteristics of the seller and so on. The purpose of all these studies is to analyze online marketing transactions in e-commerce. In this paper, we analyze e-commerce network from the perspective of graph theory. Our contributions lie in two aspects as following: (1) crawl Taobao share-platform using Scrapy crawl architecture. After analyzing format of web pages in Taobao deeply, combined with the BFS and MHRW two kinds of sampling methods, we ran crawler on five PCs for 30 days. Besides, we list some big problems encountered in the crawling process, then give the final solution. In addition, we crawled one type of sellers' data in order to analyze relationships between sellers and buyers. (2) Analyze characteristics of users' behavior in Taobao share-platform based on obtained dataset. We intend to find the relationships between sellers and buyers connected by items in share-platform. Surprisingly, we find that share-platform is a tool for some buyers to advertise items for sellers who have high credit score, and other buyers only to help them to support the platform.