CA-MCM overhaul

**Reason Discussion**:

Currently there are some CA-MCM interaction issues, which we want to fix. One solution is to change the entire CA-MCM working which is seen currently.
This issue is to discuss the feasability of such approaches

**Terms for the discussion (to avoid confusion)**:

`k/CA` = kubernetes CA 
`g/CA` = gardener CA (fork of `k/CA`)
`new-CA` = new CA code we'll implement which could be a component or a library

**Dimensions of discussion**:

1) Possible Goals
    1) Use new-CA as a library inside MCM, new-CA library is just recommending and MCM is deciding. Currently g/CA has a binding recommendation
        * Ditch entire g/CA , design, implement from scratch. Basically leverage more kube-scheduler predicates directly
        * Get rid of node-groups 
        * Benefit 
            * Can support more than 1000 nodes as CA only supports 
            * Can fit more pods on the nodes

    2) Leverage current k/CA
        * Combine MCM into g/CA, so CA runs MCM controller, and ditch current MCM controller completely
        * we still maintain the fork, but the aim is to leverage the current features and community support with upstream offers
        * Benefit:
            * solves MCM is down and CA is up kind of issues
            * Targeted removal of machine can be easier
2) High Demand stories (which use current design)
      * https://github.com/gardener/autoscaler/issues/227
      * https://github.com/gardener/autoscaler/issues/154
      * other relatively smaller bugfixes list in [CA-MCM board](https://github.com/orgs/gardener/projects/22/views)
3) Impact of overhaul to deal with current problems
      * What current CA functionality which are unpleasant *(need to verify them)*
        * Kube-scheduler config can be different from CA imported scheduler code 
        * Limitation of 1 machine type per node grp 
        * Many CLI flags in k/CA which could confuse customer
        * Can’t handle waitForFirstConsumer PVs 
        * Increase utilisation of seeds , but doesn't seem to be done with current CA
        * Scale-down treated secondary, Scale-up treated as primary goal
           * Scale-down not supported in same `RunOnce()` flow, if scale-up happened / until it happens, or scale-down in cool-down
4) Time required to be invested (excluding any time spent on current design and other dev tasks)
     * 1 yr min.
5) Maintenance effort, Support 
    * need to deal with all the issues(verifying them), implementing them even if they are provided by k/CA
    * community support will be lost
6) Rollout strategy (if implementing)
    * keeping the current design running , and deploying MCM with recommendary CA (Goal 1) and compare the recommendations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CA-MCM overhaul #251

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CA-MCM overhaul #251

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions