- 
                Notifications
    
You must be signed in to change notification settings  - Fork 3
 
Description
How to categorize this issue?
/area quality robustness usability open-source
/kind enhancement cleanup api-change task
What would you like to be added:
I would like etcd-backup-restore to be refactored, ie, re-written, with components that can be individually turned on/off and provide well-defined interfaces and channels that can be easily consumed by other components within and outside the system. I would like this new replacement of etcd-backup-restore to be lean, modular, well-tested and extensible. I would like the etcd bootstrapping process to be simplified and more maintainable. I would the memory footprint for etcd DB restoration to be reduced. I would like garbage collection of snapshots to be simple and easy to understand.
Why is this needed:
The issues mentioned above cannot be solved within the old repository, sue to various reasons such as testability, backward compatibility concerns, component-wise changes to the code, which require changes in every other component anyway. The only way to roll out a new version of etcd-backup-restore is to fully re-write its components from scratch, possibly re-using existing code snippets from the old code. The name "etcd-backup-restore" is no longer relevant since the code does much more than just backups and restorations - it takes care of maintenance of the etcd DB, along with updation of k8s resources such as member leases and EtcdMember resources that are used by druid for computing etcd cluster status and performing other actions such as compaction or remediations, as specified in the EtcdMember proposal. Accordingly, the name "etcd-backup-restore" needs to be evolved into something that correctly depicts what the code does, which is to manage or look after the etcd, like a steward.
Task List:
- Basic project structure
 - Components (enabled/disabled by individual config flags on default command; metrics and metering registered per component; additional endpoints registered per component as required)
 - [Feature] Always-on HTTP server with default endpoints #2
 - [Feature] Observe etcd cluster leadership changes #3
 - [Feature] Implement etcd lock mechanism #4
 - [Feature] Coordinate etcd defragmentation amongst multiple steward instances #5
 - [Feature] Handle etcd alarms #6
 - [Feature] Implement lean and well-defined snapstore for object storage providers #7
 - [Feature] Implement snapshot compression/decompression #8
 - [Feature] Implement snapshotting #9
 - [Feature] Implement better garbage collection of snapshots #10
 - [Feature] Implement failure-tolerant etcd restoration #11
 - [Feature] Implement snapshot compaction #12
 - [Feature] Implement etcd data validation #13
 - [Feature] Implement simple bootstrapping of etcd #14
 - [Feature] Implement configurable updation of EtcdMember resource #15
 - [Feature] Update etcd member lease #16
 -  CLI commands
- Default (no subcommand)
 - Compact subcommand
 - Copy-Backups subcommand
 
 -  Testing
- Unit tests for all individual functions/methods (small functions, quick tests)
 - Integration tests - working of a component within isolation, using an embedded etcd as necessary
 - e2e tests - working of all components together, along with etcd-wrapper
 - Load/performance tests - to be run on every PR, to avoid regression in performance of steward components
 
 - Create make targets
 - Set up CI/CD pipelines, jobs for tests, builds
 -  Documentation
- User - getting started, usage
 - Operator - getting started, deployment, operations
 - Developer - getting started, testing, adding new functionality
 - Concepts - design decisions
 
 -  [Upgrade] Move the etcd from 
v3.4.26tov3.4.34tov3.5.xetcd-druid#445 
Dependency Graph
flowchart TD
%% <Legend>
legend --> start
subgraph legend["Legend"]
    direction LR;
    notstarted("Issue is not started"):::notstarted;
    started("Issue is in progress"):::started;
    completed("Issue is done"):::completed;
    notstarted --> started --> completed;
end
%% </Legend>
%% <CSS>
classDef notstarted fill:#FFF,color:#000;
classDef started fill:#fae17d,color:#000;
classDef completed fill:#ccffd8,color:#000;
%% </CSS>
%% <Issues>
start("Start"):::default;
issue1944505244("<a href='https://github.com/gardener/etcd-steward/issues/2' style='text-decoration:none;color: inherit;'>[Feature] Always-on HTTP server with default
endpoints</a>"):::notstarted;
issue1944514155("<a href='https://github.com/gardener/etcd-steward/issues/3' style='text-decoration:none;color: inherit;'>[Feature] Observe etcd cluster leadership
changes</a>"):::notstarted;
issue1399295692("<a href='https://github.com/gardener/etcd-druid/issues/445' style='text-decoration:none;color: inherit;'>[Upgrade] Move the etcd from `v3.4.26` to
`v3.4.34` to `v3.5.x`</a>"):::started;
issue1944525166("<a href='https://github.com/gardener/etcd-steward/issues/4' style='text-decoration:none;color: inherit;'>[Feature] Implement etcd lock mechanism</a>"):::notstarted;
issue1974016962("<a href='https://github.com/gardener/etcd-steward/issues/5' style='text-decoration:none;color: inherit;'>[Feature] Coordinate etcd defragmentation
amongst multiple steward instances</a>"):::notstarted;
issue1974065334("<a href='https://github.com/gardener/etcd-steward/issues/6' style='text-decoration:none;color: inherit;'>[Feature] Handle etcd alarms</a>"):::notstarted;
issue1974091044("<a href='https://github.com/gardener/etcd-steward/issues/7' style='text-decoration:none;color: inherit;'>[Feature] Implement lean and well-defined
snapstore for object storage providers</a>"):::notstarted;
issue1974106269("<a href='https://github.com/gardener/etcd-steward/issues/8' style='text-decoration:none;color: inherit;'>[Feature] Implement snapshot
compression/decompression</a>"):::notstarted;
issue1974249579("<a href='https://github.com/gardener/etcd-steward/issues/9' style='text-decoration:none;color: inherit;'>[Feature] Implement snapshotting</a>"):::notstarted;
issue1978120400("<a href='https://github.com/gardener/etcd-steward/issues/10' style='text-decoration:none;color: inherit;'>[Feature] Implement better garbage
collection of snapshots</a>"):::notstarted;
issue1978162579("<a href='https://github.com/gardener/etcd-steward/issues/11' style='text-decoration:none;color: inherit;'>[Feature] Implement failure-tolerant etcd
restoration</a>"):::notstarted;
issue1978175585("<a href='https://github.com/gardener/etcd-steward/issues/12' style='text-decoration:none;color: inherit;'>[Feature] Implement snapshot compaction</a>"):::notstarted;
issue1979101179("<a href='https://github.com/gardener/etcd-steward/issues/13' style='text-decoration:none;color: inherit;'>[Feature] Implement etcd data validation</a>"):::notstarted;
issue1979143682("<a href='https://github.com/gardener/etcd-steward/issues/14' style='text-decoration:none;color: inherit;'>[Feature] Implement simple bootstrapping of
etcd</a>"):::notstarted;
issue1979171449("<a href='https://github.com/gardener/etcd-steward/issues/15' style='text-decoration:none;color: inherit;'>[Feature] Implement configurable updation
of EtcdMember resource</a>"):::notstarted;
issue1979180211("<a href='https://github.com/gardener/etcd-steward/issues/16' style='text-decoration:none;color: inherit;'>[Feature] Update etcd member lease</a>"):::notstarted;
finish("Finish"):::default;
%% </Issues>
%% <Dependencies>
start --> issue1944505244;
start --> issue1399295692;
start --> issue1944525166;
start --> issue1974065334;
start --> issue1974091044;
start --> issue1979171449;
start --> issue1979180211;
issue1944505244 --> issue1944514155;
issue1944505244 --> issue1974016962;
issue1944505244 --> issue1974249579;
issue1944505244 --> issue1978162579;
issue1944505244 --> issue1978175585;
issue1944505244 --> issue1979101179;
issue1944505244 --> issue1979143682;
issue1399295692 --> issue1944514155;
issue1944525166 --> issue1974016962;
issue1944525166 --> issue1974249579;
issue1974091044 --> issue1974106269;
issue1974091044 --> issue1974249579;
issue1974091044 --> issue1978120400;
issue1974091044 --> issue1978162579;
issue1974106269 --> issue1974249579;
issue1974106269 --> issue1978162579;
issue1974249579 --> issue1978175585;
issue1978162579 --> issue1978175585;
issue1978162579 --> issue1979143682;
issue1979101179 --> issue1979143682;
issue1944514155 --> finish;
issue1974016962 --> finish;
issue1974065334 --> finish;
issue1978120400 --> finish;
issue1978175585 --> finish;
issue1979143682 --> finish;
issue1979171449 --> finish;
issue1979180211 --> finish;
%% </Dependencies>