GitHub - torcellite/gatech-cs6262-crawler: This module is a part of the project developed by "Team 3" for CS 6262 (Network Security), Spring 2017 at GaTech.

Crawler

The module crawls the given list of domains. It tries to elicit drive-by downloads by performing grid-based clicks.

The screenshot of the main page and every pop-up is captured.
The resources requested (including drive-by downloads) by the page are downloaded for inspection using a background cURL process.
The infrastructure data for the website and the websites hosting the requested resources are also collected.

Requirements

PhantomJS
Python 2.7

Install all the python libraries using pip -r install requirements.txt

Usage

To crawl one domain like google.com:

bash start.sh google.com crawled_websites/google.com

To crawl a list of domains stored in a file called "website_list"

bash run_crawler.sh website_list

To spawn multiple crawler instances to crawl multiple files like "website_list"

bash run_multiple_crawler.sh 8

The file name convention for the lists is "website_list_[num]" where [num] varies from 1 to the number of instances.

To start verifying all the downloaded files with VT for a given date and merge all the data into a single file.

bash start_virustotal.sh 04-04-17

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.gitignore		.gitignore
README.md		README.md
crawler.js		crawler.js
download.sh		download.sh
dump_downloadurl.py		dump_downloadurl.py
featurecsv_format.py		featurecsv_format.py
get_dns.py		get_dns.py
merge.py		merge.py
requirements.txt		requirements.txt
run_crawler.sh		run_crawler.sh
run_multiple_crawlers.sh		run_multiple_crawlers.sh
start.sh		start.sh
start_virustotal.sh		start_virustotal.sh
stop.sh		stop.sh
stop_multiple_crawlers.sh		stop_multiple_crawlers.sh
virustotal_verify.py		virustotal_verify.py
vt_prepare_list.py		vt_prepare_list.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Crawler

Requirements

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

torcellite/gatech-cs6262-crawler

Folders and files

Latest commit

History

Repository files navigation

Crawler

Requirements

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages