IMDB Web Crawler

Web crawler for fetching information of people born on a specified day from IMDB, which is written to a MongoDB database. Information from the most popular x profiles if fetched. The crawler part is based off Michael Okoko's blog post.

Note that there is rate limiting in place as the client may be blocked if too many requests are sent.

The aim of this project was to learn about Go and web scraping/crawling.

Demo

Build and Run Instructions

Make sure Go is installed.

To compile, run:

go build ./crawler.go

Before running the program, run a MongoDB instance on port 27017. This can be easily done using Docker:

docker run --name mongo -p 27017:27017 -d mongo:4.4.6

Note that if MongoDB is not running the crawler will still work, but writing to MongoDB will be disabled. The crawler will write to the profiles collection in the crawler database. These will be created by the crawler if they don't already exist.

Then, run the crawler:

./crawler.go --day <day of birthday> --month <month of birthday> [--profileNo <number of profiles to fetch>] [--mongoUri <MongoDB URI>]

Alternatively, for development, go run can be used:

go run . --day <day of birthday> --month <month of birthday>

To get more help on how to run the program and to check the program defaults, run:

./crawler --help

Running tests

Make sure you have a MongoDB instance running as described above. Then, run:

go test

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
README.md		README.md
crawler.go		crawler.go
crawler_test.go		crawler_test.go
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IMDB Web Crawler

Demo

Build and Run Instructions

Running tests

About

Uh oh!

Releases

Packages

Uh oh!

Languages

dominikrys/web-scraper

Folders and files

Latest commit

History

Repository files navigation

IMDB Web Crawler

Demo

Build and Run Instructions

Running tests

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages