Crawler

A simple web crawler written on Go. It recursively crawls urls from defined url until the desired depth of recursion will be achieved.
The result will be placed in a json file.

Install

Download the latest release binary file from: https://github.com/OrlovM/crawler/releases/

Or

Get the source files with:

go get github.com/OrlovM/crawler

And build with:

make build

How to use

Run it with or without command line flags.

Example:

./crawler --startURL https://google.com --depth 3 --concurrency 20 -v -p ./results/result.json

Command line flags:

--StartURL URL to start from (default: "https://ya.ru")

--depth Depth refers to how far down into a website's page hierarchy crawler crawls (default 2)

--concurrency A maximum number of goroutines work at the same time (default: 50)

--verbose, -v Prints details about crawling process (default: false)

--filepath, -p A path there the file with results of crawl will be created. Could be specified with file name e.g. "/a/b.json" or only a directory "/a/". If file name is not specified it will be set to default pattern "2006-01-02T15:04:05.json"`, (default: ./crawl_results/)

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.github/workflows		.github/workflows
cmd/crawler		cmd/crawler
internal		internal
.goreleaser.yml		.goreleaser.yml
LICENSE.md		LICENSE.md
Makefile		Makefile
crawler.go		crawler.go
go.mod		go.mod
go.sum		go.sum
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Crawler

Install

Or

How to use

About

Uh oh!

Releases 6

Packages

Languages

License

orlovm/crawler

Folders and files

Latest commit

History

Repository files navigation

Crawler

Install

Or

How to use

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Languages

Packages