Skip to content

Conversation

eyal0
Copy link

@eyal0 eyal0 commented Apr 22, 2021

The skip-save option allows users to force the cache to skip saving the cache. skip-save can be set to a constant or to an expression that will be evaluated at the end of the CI job, such as an environment variable.

This is for #498

@eyal0 eyal0 requested a review from a team as a code owner April 22, 2021 03:34
@eyal0
Copy link
Author

eyal0 commented Apr 22, 2021

@dhadka Please take a look at this. It implements the "skip-save" idea that you had.

Testing is here: https://github.com/eyal0/test-cache/runs/2406514627?check_suite_focus=true

If you look at the log of the CI, you'll see that SKIP_FOR_FOO was set to true so only the foo cache is skipped. The bar cache is not skipped. This is apparent later in the log where it says:

Cache saved with key: bar-560227097

and

Cache saving was disabled by setting skip-save.

This should fix up a lot of the issues that were filed. One thing that it won't fix is forcing a save even when CI fails. That was the request in #92. It had 72 users that want this feature so, you know, kind of popular! What to do about this? My #498 could partially solve this because it has the "always/never" mechanic where you can specify that you want to force a save, even if there is a failure! We would just remove the post-if and check for success inside the code.

What are your thoughts? Seems like it would be important to get this right the first time, rather than modify the API to caching many more times!

The skip-save option allows users to force the cache to skip saving the cache.  skip-save can be set to a constant or to an expression that will be evaluated at the end of the CI job, such as an environment variable.

This is for actions#498
@eyal0
Copy link
Author

eyal0 commented Apr 22, 2021

Maybe skip-save should have no default value and the code can detect this and perform the default behavior. And when skip-save is false then it will always save, like was wanted in #92? And when true, then it'll skip the save, like the other issues requested.

That makes skip-save a tri-state variable, though: true, false, and default. What's worse, it has a sort of negative logic where true means don't save and false means "do save. So true means no and false means yes. 😞 This is going to be confusing for people, I'm certain.

Anyway, we could leave it as-is and have the default for skip-save be failure(). Then saving would be skipped when there is a failure, just like in the post-if. And then remove the post-if. That would fix #92 because the requesters of that feature could just set the skip-save to false and then it would save even if the job fails, which is what they wanted. Default behavior is unchanged. But then skip-save false is more like a "force save", which isn't intuitive from the name in my opinion. Maybe better to call it, like, should-save or save-condition and let the default be success(). Users that never want to save put false, always want to save (except for primary key hit) put true, and the default is the current behavior. Of course, users can use an environment variable in there and control the true/false during the job if they want.

What do you think is the best solution?

@dhadka
Copy link
Contributor

dhadka commented Apr 26, 2021

@eyal0 Thanks for putting this together! I'm out this week but CC'ing @konradpabjan for review.

@ollydev
Copy link

ollydev commented Apr 27, 2021

I vote for save-condition it covers both of the most requested features: skip save & save even if failed.
Providing it can be controlled during the job (like you said) this is great. 👍

@eyal0
Copy link
Author

eyal0 commented Apr 29, 2021

This fixes #272

Copy link
Contributor

@konradpabjan konradpabjan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @eyal0 !

I really like the direction of this PR!

A few things

  • What's up with package-lock.json having so many changes?
  • Documentation would be great! A few updates to README.md would go a long way in terms of documenting how to use this

In terms of naming, I think skip-save is good 👍

action.yml Outdated
description: 'The chunk size used to split up large files during upload, in bytes'
required: false
skip-save:
description: 'Whether or not to skip the saving of the cache. If this is set to "true", the cache will not be saved. It can also be set to an evironment variable such as "\$\{\{ env.MY_SKIP_SAVE \}\}". If the MY_SKIP_SAVE environment variable is "true" by the end of the CI, caching will be skipped. Default is to update if and only if there was no primary-key exact cache hit.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this comment is really long, can you use > to split it up so it's more readable.

Here is an example: https://github.com/actions/checkout/blob/25a956c84d5dd820d28caab9f86b8d183aeeff3d/action.yml#L12-L24

@eyal0
Copy link
Author

eyal0 commented Apr 29, 2021

  • What's up with package-lock.json having so many changes?

It's some npm thing I guess? I don't know much about node.js package management. I think that when I typed npm install to get all the dependencies, it might have updated that. Maybe there is someone that knows something about npm that can help us out? I just barely even know javascript. :-)

Documentation is a good idea, for sure. I'll do that.

What do you think about #571 (comment) and #92? this is a place where it probably doesn't make sense to have an incremental solution. We don't want to release a new parameter, modifying the API, and then have to do it again! Better to get it right the first time. The concerns in #92 seem totally reasonable and this PR will definitely not address them.

I think that a better way to think about this PR is not just how to fix the problem but rather, to think about: How do we want the API to behave?

@dhadka
Copy link
Contributor

dhadka commented May 5, 2021

What's up with package-lock.json having so many changes?

I think this is because package.json specifies the versions with ^, such as ^1.0.5, which I believe means it will update to the latest version of that dependency (excluding a major version [breaking] change) whenever npm update or npm install is run.

@rcowsill
Copy link

rcowsill commented May 9, 2021

What's up with package-lock.json having so many changes?

I think this is because package.json specifies the versions with ^, such as ^1.0.5, which I believe means it will update to the latest version of that dependency (excluding a major version [breaking] change) whenever npm update or npm install is run.

It looks like eyal0 is using NPM 7, which has upgraded the package-lock.json to "lockfileVersion: 2" format. The installed package versions are the same as the ones in the current lock file.

The PR doesn't modify the package.json, so it should be safe to discard the changes to package-lock.json.

@eyal0
Copy link
Author

eyal0 commented May 9, 2021

I think that I've addressed all comments. In order to keep the commit history, I haven't rebased. If you want, rebase and squash.

@eyal0
Copy link
Author

eyal0 commented May 9, 2021

(I still worry about #92 . That is a reasonable request and we haven't left room to fix it here.)

@dhadka
Copy link
Contributor

dhadka commented May 11, 2021

I think that a better way to think about this PR is not just how to fix the problem but rather, to think about: How do we want the API to behave?

Just thinking of options to unify this and #92...

If we offered separate save and restore steps, we could essentially move the post-if logic, which is defined by the Action, to the if: condition in a user's workflow. For example:

Always save cache:

- uses: actions/cache-restore@v2
  path: foo/
  key: ${{ runner.os }}-foo-${{ hashFiles('**/package-lock.json') }}
- ...
- uses: actions/cache-save@v2
  path: foo/
  key: ${{ runner.os }}-foo-${{ hashFiles('**/package-lock.json') }}
  if: always()

Skip saving if env var set (or any other valid expression):

- uses: actions/cache-restore@v2
  path: foo/
  key: ${{ runner.os }}-foo-${{ hashFiles('**/package-lock.json') }}
- ...
- uses: actions/cache-save@v2
  path: foo/
  key: ${{ runner.os }}-foo-${{ hashFiles('**/package-lock.json') }}
  if: env.SKIP_SAVE != 'true'

Other useful scenarios I can think of are:

  1. Saving the cache earlier in a workflow
  2. Skip saving on certain branches (to avoid going over the 5 GB quota)

@eyal0 thoughts on this approach?

(We'd of course continue to offer the current action in its current form for those that just need the default behavior. This approach has some downsides like needing to specify the path / key in both steps, maintaining multiple actions, more burden on users to figure out the correct setup, etc. There are also some implementation details we'd need to think about...the Action currently passes some info from the restore to the save step using state variables, and we'd need to see if and how that's affected when the Action is split into separate steps.)

@shrink
Copy link

shrink commented May 11, 2021

The recent surge in misuse of actions by malicious third parties has highlighted the importance of a security first approach to actions, as has been described by the GitHub Security Lab in Keeping your GitHub Actions and workflows secure: Untrusted input:

When writing custom GitHub Actions and workflows, consider that your code will often run with repo write privileges on potentially untrusted input. Keep in mind that not all GitHub event context data can be trusted equally. By adopting the same defensive programming posture you would employ for any other privileged application code you can ensure that your GitHub workflows stay as secure as the actual projects they service.

The simple and convenient read + write nature of actions/cache is great for adding caching with ease, but it encourages people -- myself included -- to add cache writing without considering the potential for misuse. Although it's increasing the burden during implementation, I agree with @dhadka that the ideal design for this action is save as an opt-in decision (default to not save) and I'd also say that any examples should be updated to provide a clear indication that writing to the cache should only happen from trusted branches (e.g: pipelines on main can write cache, pipelines on Pull Requests should not).

@rcowsill
Copy link

@dhadka Small edit to your first example to add multiple restore keys:

- uses: actions/cache-restore@v2
  path: foo/
  restore-keys: |
    ${{ runner.os }}-foo-${{ hashFiles('**/package-lock.json') }}
    ${{ runner.os }}-foo-
- ...
- uses: actions/cache-save@v2
  path: foo/
  key: ${{ runner.os }}-foo-${{ hashFiles('**/package-lock.json') }}
  if: always()

I think splitting the action up would make the key configuration clearer. The restore action would have a simple list of keys to try in order, while the save action only has a single key.

Perhaps actions/cache-restore would need to output the matched key so that actions/cache-save could be skipped when the key didn't change.

@eyal0
Copy link
Author

eyal0 commented May 11, 2021

I dunno...

For me, I just need somewhere to store and load files, you know? If the action cache were just an FTP server where I could upload and download files and it had a some quota on how many files each user project could store, that'd be fine. All this discussion about how to skip saving or force saving or the inability to overweight files, it's all just users that want to get basic stuff that a file server could do but access is difficult because actions/cache is an API to a file server that is very opinionated about how things should be stored, with primary keys and restore keys and no overwriting and the like.

So if the interface is going to change, just make the action behave like an FTP server where every project gets a little storage. FTP is already well understood and tested. And if people have a strong opinion about how to implement caching on top of an FTP server, cool. Let them right their own actions on top of it.

Anyway, my two cents. I get that there might be all sorts of business reasons why GitHub didn't just say: "Hey, here's an FTP server that you can use for you project, up to 5GB. Go ahead and build your own stuff on top of it, like a cache or whatever." Maybe there's a good reason why it isn't like that.

Copy link

@Noste Noste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to have this PR merged. Is there anything I can do to help get this in action:main ?

@mashail
Copy link

mashail commented Sep 28, 2021

Hey,
we are looking for this PR to be merged

@bmarick
Copy link

bmarick commented Apr 25, 2022

Any update on this pull request?
I am not familiar with typescript, but I would be happy to help.
Would love to see this feature enabled!

@yadigar-prospr
Copy link

Any update on this pull request? I am not familiar with typescript, but I would be happy to help. Would love to see this feature enabled!

Nothing changed, I had to fork and made development for myself

@eduardomoroni
Copy link

eduardomoroni commented Jun 3, 2022

Related to #795
@aparna-ravindra @Noste @yadigarbz @eyal0 @kotewar
Is there anything I could do to help this getting merged? I'd happy to contribute in any way I can so I don't have to keep a fork for us 😅

@eyal0
Copy link
Author

eyal0 commented Sep 11, 2022

I prefer the other way with the update-env-variable so I'll close this one and instead work on this: #498

I think that it's better.

@kotewar
Copy link
Contributor

kotewar commented Dec 7, 2022

Hey all, 👋🏽

We have created a discussion with a proposal that will solve this problem, do let us know your feedback, thanks 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:granular-control Issues related to granular control in saving or restoring cache

Projects

None yet

Development

Successfully merging this pull request may close these issues.