-
Notifications
You must be signed in to change notification settings - Fork 52
GCSWorkspace #417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCSWorkspace #417
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the new abstractions. My main question is about remote locking. It seems like GCSFs does have the notion of "committing" a file, similar to Beaker (https://gcsfs.readthedocs.io/en/latest/api.html#gcsfs.core.GCSFile.commit), but it doesn't look like we use that feature? From what I could tell, the "locking" mechanism in GCSClient works by checking for the existence of some placeholder file? Does gcs_fs.touch(placeholder_file) fail if another process beats it by touching that file first?
General comments:
- I think
tango/integrations/(beaker|gs)/common.pyand...util.pycan be merged into the same file.
| It stores the results of steps on some RemoteWorkspace. | ||
| It also keeps a limited in-memory cache as well as a local backup on disk, so fetching a | ||
| step's resulting subsequent times should be fast. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| step's resulting subsequent times should be fast. | |
| step's result subsequent times should be fast. |
That commit() works for files rather than folders. But perhaps I can utilize it recursively (Looking into it). From what I understand, beaker also performs commit at the folder level, right?
The placeholder file is for creation of any folder, since you can't create an empty folder in gcs.
Agreed. |
|
I have not looked at the code, just your comments on here. In general, lock files don't work on a datastore like GCS, because they are eventually consistent. Different machines might see different states for a while. But maybe GCS has some magic to make sure states are consistent across machines (and operations like creating a file appear atomic)? |
|
Oh wow! https://cloud.google.com/storage/docs/consistency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool!! And fast!
My only comments are about naming. I feel like we should call this integration "gcs" instead of "gs"? Or possibly "gc" for Google Cloud, since we're not just using GCS.. we're also using Datastore.
|
|
||
| def __init__( | ||
| self, | ||
| workspace: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make more sense to call this "bucket"?
@epwalsh My main reason for calling it "gs" rather than "gcs" was to keep things uniform with gsutil. "gs://workspace-bucket" is a valid location that people can copy-paste and use with cached-path or gsutil directly. |
|
Fair enough! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's entirely possible that I missed it and it's already there, but we need retries on all network calls, GCS and Datastore.
Cloud storage and Datastore both have retries already implemented. The default configuration is here . Do you mean that we should have more control over the retries? We can customize them. |
No, if we already have retries set up, this is fine. I like their default settings, too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is one big feature!
Changes proposed in this pull request:
RemoteClient,RemoteStepCache,RemoteWorkspace. TheRemoteClientmodels the beaker client (in fact, for beaker, it's mostly a thin wrapper right now).Cleanup:
Testing with BeakerExecutor: