-
Couldn't load subscription status.
- Fork 52
Added BeakerScheduler class for handling resource assignment
#407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This allows users to implement their own subclasses to customize how the `BeakerExecutor` allocates resources to run each step.
| return ResourceAssignment( | ||
| cluster=cluster_to_use, resources=task_resources, priority=self.priority | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method allowed to return a value that says "We can't schedule this one right now."? I'm thinking of a scheduler that doesn't queue more than a few jobs ahead of time, and then waits to see which cluster frees up first. Though once again that's a problem that goes away when Beaker fixes https://github.com/allenai/beaker/issues/2544.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, no. But that would be nice. It would take some refactoring though so should probably be a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! Just kidding, that was easy: 76e5e4d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And cb438bd
| ) | ||
| steps_left_to_run.discard(step) | ||
| elif isinstance(exc, ResourceAssignmentError): | ||
| submitted_steps.discard(step_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't mean they are discarded forever, does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, they get put back into the queue, or I guess kept in the queue. The important queue here is steps_left_to_run.
This allows users to implement their own subclasses to customize how the
BeakerExecutorallocates resources to run each step. Here's an example of a custom implementation that makes jobs preemptible if they're not run on AllenNLP clusters:https://github.com/allenai/tango-beaker-template/blob/0346e8719388cf8f4cc0c80ac713ae14f570f7e0/scheduler.py