Skip to content

Conversation

@damian0815
Copy link

@damian0815 damian0815 commented Oct 17, 2025

  • Basic functionality unit tests
  • warcio implementation
    • Validate output is identical to MRJob output with "test" robotstxt in MRJob repo
    • Validate on recent full-scale crawl output
  • fastwarc implementation
  • unit test to validate text encoding edge cases and validity (currently all test cases are completely valid utf8)
  • check output works with crawl-tools/server/seed/sitemaps/sitemaps_robotstxt.py

@damian0815 damian0815 marked this pull request as draft October 17, 2025 15:27
@damian0815 damian0815 marked this pull request as ready for review October 20, 2025 14:24
Signed-off-by: Damian Stewart <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant