The freeCodeCamp
Python Scrapy
Beginners Course
The Complete 12-Part Scrapy Beginners Course
The freeCodeCamp Scrapy Beginners Course is a complete Scrapy beginners course that will teach you everything you need to learn to start scraping websites at scale using Python Scrapy. Including:
- Creating your first Scrapy spider
- Crawling through websites & scraping data from each page
- Cleaning data with Items & Item Pipelines
- Saving data to CSV files, MySQL & Postgres databases
- Using fake user-agents & headers to avoid getting blocked
- Using proxies to scale up your web scraping without getting banned
- Deploying your scraper to the cloud & scheduling it to run periodically
In Part 1: Scrapy & Course Overview we go through what is Scrapy and what you will learn in this course. Including:
In Part 2: Setting Up Environment & Scrapy we go through how to setup your Python environment along with installing Scrapy. We walk through:
In Part 3: Creating Scrapy Project we go through how create a Scrapy project and explain all of its components. We walk through:
In Part 4: First Scrapy Spider we go through how to create our first Scrapy spider to scrape BooksToScrape.com. We walk through:
In Part 5: Crawling With Scrapy we go through how to create a more advanced Scrapy spider that will crawl the entire BooksToScrape.com website and scrape the data from each individual book page. We walk through:
In Part 6: Cleaning Data With Item Pipelines we go through how to use Scrapy Items & Item Pipelines to structure and clean your scraped data as scraped data can be very messy and unstructured. We walk through:
In Part 7: Saving Data To Files & Databases we go through how to save our scraped data to CSV files and MySQL & Postgres databases. We walk through:
In Part 8: Faking Scrapy Headers & User-Agents we go through how to use fake headers and user-agents to help prevent your scrapers from getting blocked. We walk through:
In Part 9: Rotating Proxies & Proxy APIs we go through how you can use rotating proxy pools to hide your IP address and scrape at scale without getting blocked. We walk through:
In Part 10: Deploying & Scheduling Spiders With Scrapyd we go through how you can deploy and run your spiders in the cloud with Scrapyd. We walk through:
In Part 11: Deploying & Scheduling Spiders With ScrapeOps we go through how you can deploy, schedule and run your spiders on any server with ScrapeOps. We walk through:
In Part 12: Deploying & Scheduling Spiders With Scrapy Cloud we go through how you can deploy, schedule and run your spiders on any server with Scrapy Cloud. We walk through:
In Part 13: Conclusion & Next Steps we summarize what we learned during this course and outline some of the next steps you can take to advance your knowledge of Scrapy even further. We walk through: