Skip to content

Conversation

eleventigerssc
Copy link
Contributor

Context

The GCS support added through #2258 has a severe performance/liveness bug that manifests when using large GCS repositories. When ExternalResourceLister performs a "list" action the internal GcsClient will always try to read the full list of GCS bucket objects starting from the root. ExternalResourceLister expects to receive children of the URI parent that is provided however GcsClient fails to fulfil the contract by fetching and returning full list of objects for a bucket.

When using the google json API we can specify a "prefix" when listing objects on a bucket. This "prefix" can be extracted as the path from an URI that is provided to the GcsClient#list method. This fix allows us to correctly list objects for a specified bucket sub-path and gets rid of the performance issue.

Contributor Checklist

  • Review Contribution Guidelines
  • Sign Gradle CLA
  • Check "Allow edit from maintainers" option in pull request so that additional changes can be pushed by Gradle team
  • Provide integration tests (under <subproject>/src/integTest) to verify changes from a user perspective
  • Provide unit tests (under <subproject>/src/test) to verify logic
  • Update User Guide, DSL Reference, and Javadoc for public-facing changes
  • Ensure that tests pass locally: ./gradlew <changed-subproject>:check

Gradle Core Team Checklist

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation
  • Recognize contributor in release notes

@eleventigerssc eleventigerssc changed the title [Resources][GCS] Use URI path as prefix to minimize listed objects [Resources][GCS] Fix resource list operation correctness and performance Sep 26, 2017
@eleventigerssc
Copy link
Contributor Author

@eriwen @bmuschko would appreciate your input on this.

@bmuschko
Copy link
Contributor

Thanks for providing the pull request. We are going to prioritize soon.

@bmuschko bmuschko changed the title [Resources][GCS] Fix resource list operation correctness and performance Fix resource list operation correctness and performance Sep 26, 2017
@bmuschko bmuschko changed the title Fix resource list operation correctness and performance Fix GCS resource list operation correctness and performance Sep 26, 2017
@bmuschko
Copy link
Contributor

bmuschko commented Oct 2, 2017

Is there a way you could write an integration that demonstrate that the implementation is now more efficient and performant?

@eleventigerssc
Copy link
Contributor Author

@bmuschko I can add an integration test that makes sure that the GcsClient is not attempting to list all bucket for single resource. Performance impact can only be measured by communicating with GCS directly and I don't think it would be good to do in tests ;]

@eleventigerssc
Copy link
Contributor Author

@bmuschko I have updated GCS Maven integration tests with an additional test that triggers the case where repository is missing the root Maven metadata and Gradle falls back to listing files iteratively. The server stub was updated to verify that only requests with a specific prefix are used when listing module resources.

@eleventigerssc
Copy link
Contributor Author

Bump @bmuschko

@blindpirate blindpirate force-pushed the jd/fix-excessive-gcs-listing branch from 1de802f to 75afd6a Compare December 14, 2017 05:36
@blindpirate
Copy link
Collaborator

blindpirate commented Dec 14, 2017

@eleventigerssc Thanks for the excellent contribution, I've finished review and everything looks fine. I'm waiting for our CI feedback then I can merge it.

The only question is unrelated to this PR, in your previous contribution: https://github.com/eleventigerssc/gradle/blob/4da6ba7cfd72ab0027c890f9cc9e3db1e26310f0/subprojects/resources-gcs/src/main/java/org/gradle/internal/resource/transport/gcp/gcs/GcsClient.java#L162

Is this path = URLDecoder.decode(uri.getPath(), "UTF-8"); redundant? Looks like URI.getPath()'s result is already decoded:

    /**
     * Returns the decoded path component of this URI.
     *
     * <p> The string returned by this method is equal to that returned by the
     * {@link #getRawPath() getRawPath} method except that all sequences of
     * escaped octets are <a href="#decode">decoded</a>.  </p>
     *
     * @return  The decoded path component of this URI,
     *          or {@code null} if the path is undefined
     */
    public String getPath() {

@eleventigerssc
Copy link
Contributor Author

@blindpirate good point re: double decode. It looks like a plain oversight from my end however we should check why the original GCS support PR had manual / char replacement: https://github.com/gradle/gradle/pull/797/files#diff-e0dee2401703fdb7ba0f18e4ce0900ccR86 - maybe URI decode works differently?

@blindpirate
Copy link
Collaborator

@eleventigerssc Got it. Thanks for your reply.

@blindpirate blindpirate merged commit 581d89d into gradle:master Dec 15, 2017
@blindpirate
Copy link
Collaborator

@eleventigerssc Merged, thanks!

@eleventigerssc
Copy link
Contributor Author

@blindpirate awesome, thank you!

@eleventigerssc eleventigerssc deleted the jd/fix-excessive-gcs-listing branch December 18, 2017 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants