The module crawls the given list of domains. It tries to elicit drive-by downloads by performing grid-based clicks.
- The screenshot of the main page and every pop-up is captured.
- The resources requested (including drive-by downloads) by the page are downloaded for inspection using a background cURL process.
- The infrastructure data for the website and the websites hosting the requested resources are also collected.
- PhantomJS
- Python 2.7
Install all the python libraries using pip -r install requirements.txt
To crawl one domain like google.com:
bash start.sh google.com crawled_websites/google.com
To crawl a list of domains stored in a file called "website_list"
bash run_crawler.sh website_list
To spawn multiple crawler instances to crawl multiple files like "website_list"
bash run_multiple_crawler.sh 8
The file name convention for the lists is "website_list_[num]" where [num] varies from 1 to the number of instances.
To start verifying all the downloaded files with VT for a given date and merge all the data into a single file.
bash start_virustotal.sh 04-04-17