Skip to content

CA-MCM overhaul #251

@himanshu-kun

Description

@himanshu-kun

Reason Discussion:

Currently there are some CA-MCM interaction issues, which we want to fix. One solution is to change the entire CA-MCM working which is seen currently.
This issue is to discuss the feasability of such approaches

Terms for the discussion (to avoid confusion):

k/CA = kubernetes CA
g/CA = gardener CA (fork of k/CA)
new-CA = new CA code we'll implement which could be a component or a library

Dimensions of discussion:

  1. Possible Goals
    1. Use new-CA as a library inside MCM, new-CA library is just recommending and MCM is deciding. Currently g/CA has a binding recommendation

      • Ditch entire g/CA , design, implement from scratch. Basically leverage more kube-scheduler predicates directly
      • Get rid of node-groups
      • Benefit
        • Can support more than 1000 nodes as CA only supports
        • Can fit more pods on the nodes
    2. Leverage current k/CA

      • Combine MCM into g/CA, so CA runs MCM controller, and ditch current MCM controller completely
      • we still maintain the fork, but the aim is to leverage the current features and community support with upstream offers
      • Benefit:
        • solves MCM is down and CA is up kind of issues
        • Targeted removal of machine can be easier
  2. High Demand stories (which use current design)
  3. Impact of overhaul to deal with current problems
    • What current CA functionality which are unpleasant (need to verify them)
      • Kube-scheduler config can be different from CA imported scheduler code
      • Limitation of 1 machine type per node grp
      • Many CLI flags in k/CA which could confuse customer
      • Can’t handle waitForFirstConsumer PVs
      • Increase utilisation of seeds , but doesn't seem to be done with current CA
      • Scale-down treated secondary, Scale-up treated as primary goal
        • Scale-down not supported in same RunOnce() flow, if scale-up happened / until it happens, or scale-down in cool-down
  4. Time required to be invested (excluding any time spent on current design and other dev tasks)
    • 1 yr min.
  5. Maintenance effort, Support
    • need to deal with all the issues(verifying them), implementing them even if they are provided by k/CA
    • community support will be lost
  6. Rollout strategy (if implementing)
    • keeping the current design running , and deploying MCM with recommendary CA (Goal 1) and compare the recommendations

Metadata

Metadata

Labels

kind/discussionDiscussion (enaging others in deciding about multiple options)kind/enhancementEnhancement, improvement, extensionlifecycle/rottenNobody worked on this for 12 months (final aging stage)priority/1Priority (lower number equals higher priority)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions