Here's a link to the GitHub repository to view the source

ScrapeHero is a web scraper implemented in python using the scrapy and selenium libraries.

I use it to continuously push information of song charts from the website Chorus into a MongoDB database. The database I'm using has a unique index for each song's md5 checksum hash to avoid inserting duplicates. This webscraper has two spiders, one which scrapes the latest songs and the other which scrapes random songs. I will run the random spider indefinitely via a bash script unitl I get ~20k songs with the desired attributes.

The goal of this project is to implement the first step in creating ChartBot.