Skip to content

Conversation

kurahaupo
Copy link

@kurahaupo kurahaupo commented Jul 9, 2021

Fixes #16055, #17215, gradle-private#3386 and possibly others TBA

Context

Mostly adjustments to the gradlew script:

  • Allow gradlew to pass shellcheck.
  • Restore "any POSIX shell" as prerequisite (not just Bash, which was recently made a requirement).
  • Address significant technical debt.

I'm approaching this from the perspective of an experienced shell user who encountered the gradlew script inside another project, and I was frankly horrified when I opened the lid.

This has turned out to be a lot steeper learning curve and a lot more work than I expected, so I would like early feedback on whether any of it is likely to be accepted by the project.

Contributor Checklist

Remaining answers are "not yet".

  • no new functionality, so it's unclear what tests I should add
  • I don't yet have an environment where I can run the test suite.
  • Provide integration tests (under <subproject>/src/integTest) to verify changes from a user perspective
  • Provide unit tests (under <subproject>/src/test) to verify logic
  • Update User Guide, DSL Reference, and Javadoc for public-facing changes
  • Ensure that tests pass sanity check: ./gradlew sanityCheck
  • Ensure that tests pass locally: ./gradlew <changed-subproject>:quickTest

Gradle Core Team Checklist

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation
  • Recognize contributor in release notes

@kurahaupo
Copy link
Author

I have a few questions about gradlew and I'm not really clear where I should ask them.

Firstly, is any of this worth my time?

Commit 4cca85e was to ensure that old pre-POSIX shells were still supported, but commit ff09404 jumped to the other extreme and now it doesn't support POSIX shells, just Bash. This PR reverses that: any POSIX shell should suffice. (Where /bin/sh is a pre-POSIX shell, gradlew can still be run but needs to be prefaced by the name of a POSIX shell; ksh suffices on Solaris.)

Secondly, there are a bunch of template variables subprojects/plugins/src/main/resources/org/gradle/api/internal/plugins/unixStartScript.txt, which as far as I can see are fixed boilerplate. Is there any reason for them to remain, rather than use their assigned values directly?

${appNameSystemProperty}    org.gradle.appname
${applicationName}          Gradle
${mainClassName}            org.gradle.wrapper.GradleWrapperMain
${optsEnvironmentVar}       GRADLE_OPTS

I assume that ${defaultJvmOpts} will vary for the system gradlew is built on?

Thirdly, I note that GRADLE_OPTS was hard-coded in some places but ${optsEnvironmentVar} was used elsewhere; was that combination intentional?

@kurahaupo
Copy link
Author

You may notice that I replaced #!/usr/bin/env sh with #!/bin/sh.

If the previous concerns about executable code in environment variables is valid, then unsafe entries in PATH in particular are also an issue. Removing /usr/bin/env addresses this.

In any case, /bin/sh and /usr/bin/env have the same near-certain guarantees from POSIX, and in practice /bin/sh has fewer unsupported systems. So we might as well close the hole.

@ljacomet
Copy link
Member

ljacomet commented Jul 9, 2021

@big-guy Can you take a look here given the work around switching to bash?

Copy link
Member

@big-guy big-guy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few questions about gradlew and I'm not really clear where I should ask them.

Firstly, is any of this worth my time?

Yes, this is greatly appreciated. Both Vaidotas and I knew this was a mess, but we didn't have enough experience to make it better.

Secondly, there are a bunch of template variables subprojects/plugins/src/main/resources/org/gradle/api/internal/plugins/unixStartScript.txt, which as far as I can see are fixed boilerplate. Is there any reason for them to remain, rather than use their assigned values directly?

Yes, this needs to remain. The same template is used to generate start scripts for any Java project that uses the application plugin, so the value of these variables can be different and non-Gradle specific.

I assume that ${defaultJvmOpts} will vary for the system gradlew is built on?

No, they're the same for all gradlew and potentially different for non-Gradle projects.

Thirdly, I note that GRADLE_OPTS was hard-coded in some places but ${optsEnvironmentVar} was used elsewhere; was that combination intentional?

This looks like it was a mistake and can just be removed. The Xdock arguments no longer do anything.

I took a look at the changes so far and I have a few high level comments:

For these two files, could you move these changes into a separate PR/branch? They're CI infrastructure changes and not related to what we ship.

  • .github/hub_scripts/pr_ci.sh
  • subprojects/performance/docs/check-rev.sh

.teamcity/mvnw is also a CI infrastructure change, but I think it may be auto-generated, so it's likely your changes will get overwritten at some point. The upstream project for this is Apache Maven, so I think it's OK if you just drop these changes for now.

You don't need to worry about making changes directly to gradlew since it's generated from unixStartScript.txt. What will happen is your changes will get incorporated into a nightly version of Gradle and when we update to use that nightly the gradlew script will be regenerated.

So I think that means you only need to submit changes for unixStartScript.txt. If you run ./gradlew plugins:quickTest locally that should test enough to shake out a big breakage. I can trigger a full pipeline that will test macOS/Windows/Linux more thoroughly.

@kurahaupo kurahaupo force-pushed the posix_shell branch 2 times, most recently from 84304d5 to f8cbcb3 Compare July 9, 2021 22:37
@kurahaupo
Copy link
Author

I'm working on unwinding the changes to .github/hub_scripts/pr_ci.sh, subprojects/performance/docs/check-rev.sh, and .teamcity/mvnw.

When other projects use the template, what do they call the wrapper script? Is it always gradlew?
If not, is there a template substitution parameter that contains the name of the script, that I can use in the documentation block at the top?

@kurahaupo
Copy link
Author

kurahaupo commented Jul 10, 2021

@big-guy I do appreciate your testing, as I'm currently stuck with

$ ./gradlew plugins:quickTest
Gradle 7.2-20210704221017+0000 requires Java 8 or later to run. You are currently using Java 7.

Obviously I need to update Java before I can use Gradle, and that's been my work in progress for a while, but when I noticed the recent updates I thought I should send my PR before things diverge any further.

I understand that as gradlew is a build artefact it doesn't technically need to be patched, but my inability to do local testing is one reason that I've left these patches in place, so that if there are any questions about specific commits, or any changes requested, I can modify gradlew and then use that as the basis for patching the template. Let me know if you need these stripped out, or whether you're OK to simply let them be overwritten by the next nightly build after merging.

Also, I'm rebasing my changing onto master before each submission to ensure a clean merge. What's your policy on doing that?

@kurahaupo
Copy link
Author

Of course, the other audience is to explain some fairly nuanced shell script to people whose primary skillset is Java. I hope my comments succeed in that respect; I would certainly appreciate feedback if anything is unclear.

@kurahaupo
Copy link
Author

Sorry, pushed again as I forgot to elide the change to check-rev.sh.

@kurahaupo
Copy link
Author

@big-guy github still thinks I have outstanding changes requested by you but I can't find them (probably because they're based on commits that I've rebased out of existence).

If you have time to take another look that would be appreciated.

@kurahaupo
Copy link
Author

kurahaupo commented Jul 10, 2021

Do you need me to write any additional tests, seeing as how there's "in theory" no modified behaviour.

I can think of some cases where behaviour might change:

  1. Under CygWin and Msys, $GRADLE_CYGPATTERN is optionally used to detect when a string is a filename that should be rewritten as a Windows path. Because I've changed the order of tests, it's theoretically possible that it now won't match a valid filename starting with a -, however I don't see how that could be useful because cygpath would have interpreted such as string as an option, not a filename.

This also raises the possibility of bugs I should fix while I'm at it:

  1. Also for Cygwin & Msys, should the expansions of $DEFAULT_JVM_OPTS, $JAVA_OPTS, and $GRADLE_OPTS be subject to file path cygpath rewriting? (They weren't, and they're still not, but should they be?)
  2. The die and warn functions (in lines 119-128) should probably write to stderr rather than stdout.
  3. The command ulimit -H -n at line 169 is used to compute a value which may wind up not being used; in that case there's no point issuing a warning at line 177.
  4. Should Darwin really be excluded from setting ulimit -n? Darwin has a BSD UNIX kernel, so it should just work.

Copy link
Member

@big-guy big-guy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully I haven't missed any of your questions.

When other projects use the template, what do they call the wrapper script? Is it always gradlew?
If not, is there a template substitution parameter that contains the name of the script, that I can use in the documentation block at the top?

It's based on the name of the application. I believe it's the same thing as applicationName.

I understand that as gradlew is a build artefact it doesn't technically need to be patched, but my inability to do local testing is one reason that I've left these patches in place, so that if there are any questions about specific commits, or any changes requested, I can modify gradlew and then use that as the basis for patching the template. Let me know if you need these stripped out, or whether you're OK to simply let them be overwritten by the next nightly build after merging.

Probably what I'll do is squash your changes into another branch and remove those changes so other people don't get confused why gradlew changed without a nightly version bump. So it's OK for you to keep these changes if it's helping you test locally.

Also, I'm rebasing my changing onto master before each submission to ensure a clean merge. What's your policy on doing that?

For the most part, you don't need to do that unless we ask. Sometimes we make CI changes that require old branches/PRs to be rebased, but we have a merge queue that does the merge to master. For where you're making changes, it's very unlikely anyone else is going to make changes in that area right now.

Do you need me to write any additional tests, seeing as how there's "in theory" no modified behaviour.

I think we can rely on our existing coverage plus the RC process to identify problems. I've already run an earlier version of your changes through our CI pipeline and I didn't find anything surprising.

I can think of some cases where behaviour might change:

I think if the behavior is different as you describe, it will be OK as well. I'll double check that we're running some of our tests through a cygwin shell on Windows.

Also for Cygwin & Msys, should the expansions of $DEFAULT_JVM_OPTS, $JAVA_OPTS, and $GRADLE_OPTS be subject to file path cygpath rewriting? (They weren't, and they're still not, but should they be?)

No, let's leave this as-is. It would be unusual for any of those to include file paths without it being a part of a larger argument. e.g., -DFOO=/path/to/foo would be hard to parse out and cygpath rewrite.

The die and warn functions (in lines 119-128) should probably write to stderr rather than stdout.

+1, this is a good idea for die. I don't think there's any good reason it's not doing that now. warn is OK on stdout.

The command ulimit -H -n at line 169 is used to compute a value which may wind up not being used; in that case there's no point issuing a warning at line 177.

+1, I think ditching the warning makes sense to me.

Should Darwin really be excluded from setting ulimit -n? Darwin has a BSD UNIX kernel, so it should just work.

From the history, I can't immediately tell if this was intentional or not. Let's leave this as-is and get the rest of the improvements you made into master first.

Signed-off-by: Martin D Kealey <[email protected]>
Signed-off-by: Martin D Kealey <[email protected]>
Signed-off-by: Martin D Kealey <[email protected]>
Rather than enumerating all directories matching /*, combining them into a
regex, and then checking the first component of each arg against that regex,
instead simply check whether the first component of each arg is a directory.

Signed-off-by: Martin D Kealey <[email protected]>
Add #( in case statements to balance brackets

Signed-off-by: Martin D Kealey <[email protected]>
@kurahaupo
Copy link
Author

kurahaupo commented Jul 11, 2021

(This explains a change that will be squashed when I next push my branch.)

Prompted by your use of quotes on line 73 I looked more closely at all the quotes and assignments.

The short version is that I found a few places where I'd messed up by removing quotes (because I didn't account for templating), but if I simply put them back the result is still problematic: Groovy can in principle supply variables that contain quote marks and other shell metacharacters.

So I'm working on a new method that looks like

SHELL_VAR=\$( cat ) <<EndOfGroovySubstitution
$groovyVar
EndOfGroovySubstitution

which means that it's possible for $groovyVar to expand to $OTHER_SHELL_VAR and have the shell expand that in turn, but without risking blowing up the shell if there are quote marks in $groovyVar. (Such quote marks might not lead to the desired outcome of course, but the error reporting will at least make sense.)

This doesn't apply in all cases; java class names consist of alphanumerics and dots, so they're safe for the shell, and the odd case where 'mainClass' consists of --module and a Java class name is also safe. However perhaps this should be documented.

(Edit note: I had intended to use read however without -r it mangles backslashes, and with -r it's not POSIX.)

@kurahaupo
Copy link
Author

kurahaupo commented Jul 11, 2021

In addition to revamping the assignments above, I'm also looking at using a related technique to remove this sed+eval pairing:

eval "set -- $(
        printf '%s\n' "$DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS" |
        xargs -n1 |
        sed ' s~[^-[:alnum:]+,./:=@_]~\\&~g; ' |
        tr '\n' ' '
    )" '"$@"'

While it's a bit simpler than it was, I'm sure it's still a mysterious black box for most script writers.

The naïve approach to replacing it would be to write something like this:

echo "$DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS" |
xargs -n1 |
while read -r line ; do
    set -- "$@" "$line"
done

however that won't work in all POSIX shells, because POSIX allows the shell to run every part of a pipeline in a separate subshell. And read -r isn't POSIX before 2018.

It will work for Ksh, and it will work for newer version of Bash if prefaced by shopt -s lastpipe (which of course will make Ksh complain). But if one has Bash (and still doesn't care about POSIX) then one can also use process substitution:

while read -r line ; do
    set -- "$@" "$line"
done < <(
    echo "$DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS" |
    xargs -n1
)

In theory this should work in any POSIX shell:

while line=$( line ) ; do
    set -- "$@" "$line"
done <<EndOfExpansion
$(
    echo "$DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS" |
    xargs -n1
)
EndOfExpansion

but almost nobody has heard of the line utility, so it wouldn't surprise me if some Linux systems classify it as "optional" and don't install it in a base system.

Beyond that, there is the option of a pure-shell version, that only uses shell built-ins, but let's just say, including a tokenizing lexer written in shell is not exactly an enhancement to overall readability:

word_list="$DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS"
unset next in_quote   # initial Lexer state:
while [ -n "$word_list" ] ; do
    # pop one character off the front of word_list, and then decide
    t=${word_list#?}
    h=${word_list%"$t"}
    word_list=$t
    case $h in
      $in_quote)
        # we were in quotes, so we've just found the close quote
        unset in_quote ;;
      ${in_quote-[[:space:]]})
        # we were not in quotes AND the char is whitespace, so it's a word break
        [ -n "${next+X}" ] && set -- "$@" "$next"
        unset next ;;
      \\)
        # backslash escapes the next character:
        # pop the next characer off the front of SHELL_VAR (same as above) ...
        t=${word_list#?}
        h=${word_list%"$t"}
        word_list=$t
        # ... and add it to the current word (same as any ordinary character, below)
        next=$next$h ;;
      [\"\'])
        # we weren't in quotes, so we are now; 
        in_quote=$h
        # if "next" is unset, make it set-but-empty instead, so that an empty quoted string won't be ignored
        next=$next ;;
      *)
        # any ordinary character gets added to the current word (or starts a new word)
        next=$next$h ;;
    esac        
done
# gather up last word, if any
[ -n "${next+X}" ] && set -- "$@" "$next"

... but at least there's no "eval", so we know it can't do anything too weird.

@kurahaupo
Copy link
Author

(in reference to the script name)

It's based on the name of the application. I believe it's the same thing as applicationName.

The script name is gradlew while $applicationName expands to Gradle. Perhaps something like ${applicationName.toLower()}w might work, but it feels really kludgy and I'm trying to get rid of kludges.

I'll just try to get rid of named references to the script within itself.

@kurahaupo
Copy link
Author

Probably what I'll do is squash your changes into another branch and remove those changes so other people don't get confused why gradlew changed without a nightly version bump. So it's OK for you to keep these changes if it's helping you test locally.

I'll do a rebase and strip them myself once we're agree that it's all satisfactory; that way I get to keep the attribution. And it will keep the changes segmented in separate commits for anyone who looks back and wonders "why".

@kurahaupo kurahaupo force-pushed the posix_shell branch 2 times, most recently from 8a26a30 to 06f567f Compare July 11, 2021 13:01
Add some notes explaining the stuff I wish that I'd known when I started
editing gradlew.

Signed-off-by: Martin D Kealey <[email protected]>
even within Gradle

Signed-off-by: Martin D Kealey <[email protected]>
Signed-off-by: Martin D Kealey <[email protected]>
Signed-off-by: Martin D Kealey <[email protected]>
@big-guy
Copy link
Member

big-guy commented Jul 12, 2021

Groovy can in principle supply variables that contain quote marks and other shell metacharacters.

I think this is OK to ignore for the changes you're doing now. This was likely already a problem and there will be other places where assumptions have been made about the variables substituted in the script. We could also move the escaping up into the template expansion (in Java code) to handle this if that would make the shell script simpler.

@big-guy big-guy added this to the 7.2 RC1 milestone Jul 12, 2021
@bot-gradle bot-gradle merged commit 82a4cb6 into gradle:master Jul 15, 2021
@big-guy
Copy link
Member

big-guy commented Jul 15, 2021

@kurahaupo Just to keep you in the loop on things.

First off, thanks for the contribution. I think you've really improved what we had and being able to keep POSIX compatibility is a big plus.

We're trying to prepare for the next release (7.2), so I merged this PR into a separate branch and reworked a couple of things: #17716

  1. Like I said above, I just reverted the changes to gradlew.
  2. I also added some additional test coverage for questions that I had with some of the changes.
  3. I removed the use of SHELL_VAR=\$( cat ) <<EndOfGroovySubstitution for substitutions. This seemed to be the source of some test failures, so I checked that we escape all/most of shell metacharacters before substituting them. It's unlikely that these substitutions would actually contain metacharacters anyways.
  4. I also tried to make some of our existing tests less sensitive to the exact implementation of the script (e.g., we used to check for an exact number of lines).

I've built a new nightly with your changes and I've submitted another PR to bump the version we use on gradle/gradle:
#17733

If you'd like to try out the latest changes, you can use upgrade to this nightly:

./gradlew wrapper --gradle-version=7.2-20210715231856+0000 && ./gradlew wrapper

The only thing that I found that seems to be broken has to do with unbalanced quotes. It seems like you used to be able to pass these through environment variables, but you can't do that any longer. I can't really think of a good reason you'd want to do that and I think the benefits of all the other work you did outweigh this breakage (if it is one). I'm going to include a note about this in our upgrade guide, but I don't think we need to fix it without a good example.

If there's more that you'd like to improve, please feel free to submit another PR with those changes and ping me.

@kurahaupo
Copy link
Author

yeah I noticed that some of the tests look for explicit lines in the generated files, so I adjusted them but perhaps I missed something.

The issue is not just unbalanced quotes; it's that different quotes are required/allowed inside different variables; whichever is not used to enclose the assignment in the template.

You can get away with using the same quote type as long as they are balanced. If the groovy substitution result in say
var=""foo""
then that is valid but foo is UNquoted, so if it contains other shell meta-characters (space, tab, newline, and ;&()|<>) it will break.

If single quotes are used in the template to prevent $ and ` from triggering expansion, then having single quotes in the value inserted by the template engine will defeat that, leaving them subject to expansion.

Really this should be fixed by having groovy insert \ before each shell meta-character, but that might still need adjustments to the test suite.

That in turn means that Groovy should be made aware of what is or isn't an array value, which on balance is a good thing, but would be a user-visible change.

However you're right that these can be improved as a separate PR.

I've managed to get a local build running so I will drop the latest patch and all the patches to gradlew, and push the branch again when I get home tonight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

unix gradle wrapper fails when path contains $ character

6 participants