Add main to SimpleRobotRulesParser for testing #193

sebastian-nagel · 2018-01-11T09:55:41Z

for quick testing whether a robots.txt is parsed as expected, e.g.

$ java ... SimpleRobotRulesParser http://www.robotstxt.org/robots.txt '*' http://www.robotstxt.org/norobots-rfc.txt
Checking URLs:
allowed         http://www.robotstxt.org/norobots-rfc.txt

The full set of options is:

SimpleRobotRulesParser <robots.txt> [[<agentname>] <URL>...]

Parse a robots.txt file
  <robots.txt>  URL pointing to robots.txt file.
                To read a local file use a file:// URL
                (parsed as http://example.com/robots.txt)
  <agentname>   user agent name to check for exclusion rules.
                If not defined check with '*'
  <URL>         check URL whether allowed or forbidden.
                If no URL is given show robots.txt rules

The pull request includes also the following minor changes:

implement toString() for robot rules
fix line breaks in comments

- implement toString() for robot rules - fix line breaks in comments

jnioche · 2018-06-04T20:26:56Z

thanks @sebastian-nagel

Add main to SimpleRobotRulesParser for testing (#193)

sebastian-nagel added the robots label Jan 11, 2018

sebastian-nagel mentioned this pull request Mar 21, 2018

Create separate core and tools modules #197

Open

sebastian-nagel added 2 commits May 15, 2018 09:56

Add main to SimpleRobotRulesParser for testing

76ed3c2

- implement toString() for robot rules - fix line breaks in comments

Do not detect MIME type as Tika dependency has been removed

eb0a61d

sebastian-nagel force-pushed the robots-parser-main branch from 91fb8e3 to eb0a61d Compare May 15, 2018 09:34

jnioche mentioned this pull request May 31, 2018

Release 0.10 #212

Closed

jnioche added the enhancement label Jun 4, 2018

jnioche added this to the 0.10 milestone Jun 4, 2018

jnioche merged commit 0c75e75 into crawler-commons:master Jun 4, 2018

jnioche added a commit that referenced this pull request Jun 4, 2018

Update CHANGES.txt

a8b4745

Add main to SimpleRobotRulesParser for testing (#193)

sebastian-nagel deleted the robots-parser-main branch June 5, 2018 08:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add main to SimpleRobotRulesParser for testing #193

Add main to SimpleRobotRulesParser for testing #193

Uh oh!

sebastian-nagel commented Jan 11, 2018

Uh oh!

jnioche commented Jun 4, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add main to SimpleRobotRulesParser for testing #193

Add main to SimpleRobotRulesParser for testing #193

Uh oh!

Conversation

sebastian-nagel commented Jan 11, 2018

Uh oh!

jnioche commented Jun 4, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants