forked from kubernetes/autoscaler
-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Labels
kind/discussionDiscussion (enaging others in deciding about multiple options)Discussion (enaging others in deciding about multiple options)kind/enhancementEnhancement, improvement, extensionEnhancement, improvement, extensionlifecycle/rottenNobody worked on this for 12 months (final aging stage)Nobody worked on this for 12 months (final aging stage)priority/1Priority (lower number equals higher priority)Priority (lower number equals higher priority)
Description
Reason Discussion:
Currently there are some CA-MCM interaction issues, which we want to fix. One solution is to change the entire CA-MCM working which is seen currently.
This issue is to discuss the feasability of such approaches
Terms for the discussion (to avoid confusion):
k/CA
= kubernetes CA
g/CA
= gardener CA (fork of k/CA
)
new-CA
= new CA code we'll implement which could be a component or a library
Dimensions of discussion:
- Possible Goals
-
Use new-CA as a library inside MCM, new-CA library is just recommending and MCM is deciding. Currently g/CA has a binding recommendation
- Ditch entire g/CA , design, implement from scratch. Basically leverage more kube-scheduler predicates directly
- Get rid of node-groups
- Benefit
- Can support more than 1000 nodes as CA only supports
- Can fit more pods on the nodes
-
Leverage current k/CA
- Combine MCM into g/CA, so CA runs MCM controller, and ditch current MCM controller completely
- we still maintain the fork, but the aim is to leverage the current features and community support with upstream offers
- Benefit:
- solves MCM is down and CA is up kind of issues
- Targeted removal of machine can be easier
-
- High Demand stories (which use current design)
- Allow deletion of a node (True deletion API) #227
- Early abort/backoff support for Gardener nodegroups a.k.a machinedeployments #154
- other relatively smaller bugfixes list in CA-MCM board
- Impact of overhaul to deal with current problems
- What current CA functionality which are unpleasant (need to verify them)
- Kube-scheduler config can be different from CA imported scheduler code
- Limitation of 1 machine type per node grp
- Many CLI flags in k/CA which could confuse customer
- Can’t handle waitForFirstConsumer PVs
- Increase utilisation of seeds , but doesn't seem to be done with current CA
- Scale-down treated secondary, Scale-up treated as primary goal
- Scale-down not supported in same
RunOnce()
flow, if scale-up happened / until it happens, or scale-down in cool-down
- Scale-down not supported in same
- What current CA functionality which are unpleasant (need to verify them)
- Time required to be invested (excluding any time spent on current design and other dev tasks)
- 1 yr min.
- Maintenance effort, Support
- need to deal with all the issues(verifying them), implementing them even if they are provided by k/CA
- community support will be lost
- Rollout strategy (if implementing)
- keeping the current design running , and deploying MCM with recommendary CA (Goal 1) and compare the recommendations
Metadata
Metadata
Labels
kind/discussionDiscussion (enaging others in deciding about multiple options)Discussion (enaging others in deciding about multiple options)kind/enhancementEnhancement, improvement, extensionEnhancement, improvement, extensionlifecycle/rottenNobody worked on this for 12 months (final aging stage)Nobody worked on this for 12 months (final aging stage)priority/1Priority (lower number equals higher priority)Priority (lower number equals higher priority)