Skip to content
This repository was archived by the owner on Mar 31, 2023. It is now read-only.

orlovm/crawler

Repository files navigation

Crawler

CI Hits-of-Code codebeat badge

GitHub release License: MIT

A simple web crawler written on Go. It recursively crawls urls from defined url until the desired depth of recursion will be achieved.
The result will be placed in a json file.

Install

Download the latest release binary file from: https://github.com/OrlovM/crawler/releases/

Or

Get the source files with:

go get github.com/OrlovM/crawler

And build with:

make build

How to use

Run it with or without command line flags.

Example:

./crawler --startURL https://google.com --depth 3 --concurrency 20 -v -p ./results/result.json

Command line flags:

--StartURL URL to start from (default: "https://ya.ru")

--depth Depth refers to how far down into a website's page hierarchy crawler crawls (default 2)

--concurrency A maximum number of goroutines work at the same time (default: 50)

--verbose, -v Prints details about crawling process (default: false)

--filepath, -p A path there the file with results of crawl will be created. Could be specified with file name e.g. "/a/b.json" or only a directory "/a/". If file name is not specified it will be set to default pattern "2006-01-02T15:04:05.json"`, (default: ./crawl_results/)

About

A simple web crawler written on Go.

Resources

License

Stars

Watchers

Forks

Packages

No packages published