Skip to content

Releases: crawler-commons/crawler-commons

crawler-commons-0.6

02 Jun 05:01

Choose a tag to compare

Release 0.6 (27/05/2015)

  • Issue 75: [Sitemaps] more robust parsing of XML elements (jnioche, kkrugler)
  • Issue 76: maven-java-formatter-plugin (jnioche)
  • Issue 73: Switch groupID in pom from com.google.code.crawler-commons to crawler-commons (jnioche)
  • Issue 71: Upgrade to Tika 1.8 (jnioche)
  • Issue 68: [Robots] Path matching should be case-sensitive (kkrugler)
  • Issue 67: [Sitemaps] Parsing of lastMod date should use time portion (kkrugler)
  • Issue 59: [Robots] Let SimpleRobotRules and its members implements the Serializable interface (kkrugler)
  • Issue 65: [Sitemaps] Make SiteMapTool simpler by removing the Recursive flag (Avi Hayun)
  • Issue 64: Upgraded to Tika 1.7 (jnioche)
  • Issue 32: [Robots] Resolve relative URL for sitemaps (jnioche)
  • Issue 62: [Sitemaps] Add new parseSiteMap method (jnioche)
  • Issue 57: [Sitemaps] SiteMap should contain a list of SitemapUrls instead of a table of them (Avi Hayun)
  • Issue 51: Upgrade httpclient to the latest version (Avi Hayun)
  • Issue 61: [Sitemaps] Sitemap Parser changes the processed flag unnecessarily (Avi Hayun)
  • Issue 56: [Sitemaps] SiteMap.setBaseUrl(...) causes the domain name to be lowered case which shouldn't happen (Avi Hayun)
  • Issue 50: Add Fetch Report to FetchedResult (lewismc, avraham2)
  • Issue 55: [Sitemaps] SitemapUrl "setPriority(String str)" should check for proper value (Avi Hayun)

crawler-commons-0.5

02 Jun 05:00

Choose a tag to compare

Release 0.5 - 09/10/2014

  • Issue 53: Spaces in a comma separated list of names in a User-agent: line cause rules to be applicable to all agents (kkrugler)
  • Issue 45: [Sitemaps] Upgrade code after release of Tika v1.6 (Avi Hayun)
  • Issue 48: Upgraded to Tika 1.6 (jnioche)
  • Issue 47: [Sitemaps] SiteMapParser Tika detection doesn't work well on some cases (Avi Hayun)
  • Issue 40: [Sitemaps] Add Tika MediaType Support (Avi Hayun)
  • Issue 39: [Sitemaps] Add the Parser a convenience method with only a URL argument (Avi Hayun via lewismc)
  • Issue 42: [Sitemaps] Add more JUnit tests (Avi Hayun via lewismc)
  • Issue 37: Upgrade the Slf4j logging Library to v1.7.7 (Avi Hayun via kkrugler)
  • Issue 41: Upgrade to JUnit v4 conventions in SiteMapParser (Avi Hayun via lewismc)
  • Issue 34: Upgrade the Slf4j logging in SiteMaps (Avi Hayun via lewismc)

crawler-commons-0.4

02 Jun 04:58

Choose a tag to compare

Release 0.4

  • Issue 13: Fix deprecation in Crawler Commons Code (lewismc via kkrugler)
  • Issue 8 : Upgrade of httpclient to v4.2.6 (Fuad Efendi, lewismc via kkrugler)
  • Issue 18: Support matching against query parameters in robots.txt rules (alparslanavci, kkrugler)
  • Issue 21: Follow Google example of giving Allow directives higher match weight than Disallow directives (y.vladimirov, via kkrugler)
  • Issue 22: Use longest-match-wins approach to matching URLs in robots.txt (kkrugler)
  • Issue 17: Support Googlebot-compatible regular expressions in URL specifications (alparslanavci. kkrugler)
  • Issue 31: Missing top level domains (jnioche, kkrugler)
  • Issue 23: Trivial improvements to UserAgent (lewismc)
  • Issue 30: SitemapIndex should allow to skip sitemaps (Sebastian Nagel, kkrugler)
  • cleanup of ANT build remnants lib and lib-ext

crawler-commons-0.3

02 Jun 04:57

Choose a tag to compare

Release 0.3

  • Upgraded to Tika 1.4 (jnioche)
  • [SiteMap] added utility class for testing sitemaps (jnioche)
  • Issue 16: remove ant scripts and configuration (lewismc)
  • Issue 27: [SiteMap] Unnecessary String concatenations when logging + in SiteMapURL.toString() (jnioche)
  • Issue 26: [SiteMap] Set correct default priority for URL in a sitemap file (jnioche)
  • Issue 25: [Robots] Robots parser should not lowercase sitemap URLs (jnioche)
  • Issue 29: [SiteMap] try urls when element is missing (jnioche)

crawler-commons-0.2

02 Jun 04:56

Choose a tag to compare

Release 0.2

  • Move to pure Maven for CC build lifecycle (lewismc)
  • Move Javadoc out of core code (lewismc)
  • Substantiate Javadoc (lewismc)
  • Review default.properties (lewismc)
  • add HTTP status code & reason to FetchedResult (Fuad Efendi via kkrugler)
  • support for multiple user agent names (Tejas Patil via kkrugler)
  • added javadoc generation, publish in /doc/javadoc (kkrugler)
  • switch to using eclipse-formatter.properties (kkrugler)
  • support robots.txt files that have UTF-16LE and UTF-16BE BOMs (kkrugler)
  • support for user agent names that contain spaces (kkrugler)
  • fixed handling of BOM in sitemaps (Vivek Magotra via kkrugler)
  • refactoring of SiteMap objects (Hannes Schwarz via jnioche)
  • added simple support for the file: protocol (kkrugler)
  • cleaned up packaging and added "install" target (kkrugler)

Crawler Commons 0.1

02 Jun 04:55

Choose a tag to compare

Release 0.1

  • parsing robots.txt
  • parsing sitemaps
  • URL analyzer which returns Top Level Domains
  • a simple HttpFetcher