-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Overview
This issue tracks the integration of the "New KAD-DHT Provide system" with "Reprovide Sweep" strategy from go-libp2p-kad-dht into Kubo 0.38+.
The modernized Provider system and related interface/architecture refactor significantly improves DHT content republishing efficiency by exploring "keyspace regions" instead of providing keys one-by-one, spreading reprovide operations evenly over time to avoid performance bursts.
Key Benefits:
- More efficient key republishing (current system can take up to 10 seconds per key if unlucky and hitting worst case scenarios). This raises the ceiling from the ~8,000 key limit during 22-hour intervals for the default DHT client
- Enables backend optimizations, concurrent provide operations with error handling and retry mechanisms, dynamic prefix length estimation for keyspace exploration etc
More details in libp2p/go-libp2p-kad-dht#1082 and libp2p/go-libp2p-kad-dht#1095
Dependencies
The following must be completed before this integration:
-
boxo integration:
-
go-libp2p-kad-dht consolidation:
- Changes from issues linked in Reprovide Sweep libp2p/go-libp2p-kad-dht#1095 need to be consolidated in the
providerbranch
- Changes from issues linked in Reprovide Sweep libp2p/go-libp2p-kad-dht#1095 need to be consolidated in the
-
Kubo dependencies:
- Update Kubo to depend on latest boxo and kad-dht with necessary changes
- Provide newly received blocks according to reprovide strategy #10837
- MFS provides strategy: Plan and implement solution for handling MFS provides to not break when
Reprovider.Strategy=mfs|pinned+mfs
- MFS provides strategy: Plan and implement solution for handling MFS provides to not break when
Implementation Tasks
Core Integration
- Update go.mod dependencies to include latest boxo and kad-dht versions
- Add new
Internal.DHTProviderSweepSystem(name tbd)Flagconfiguration option (mark as experimental, opt-in, disabled by default for now, in the future it will flip to true by default, and eventually we will remove it)- purge (re)provider queues when flag state changes, similar to how we do it for
Reprovider.Strategy
- purge (re)provider queues when flag state changes, similar to how we do it for
- Document in
changeloganddocs/config.md - Add forced
ipfs provide clearwhen switching to/from new system (similar to existingReprovider.Strategychanges)
Metrics and Observability
- Implement and document DHT provider record puts metric (ideally in go-libp2p-kad-dht - perhaps we already have one?) that counts raw DHT publish events
- @guillaumemichel : we currently have this metric
- Expose metrics via Kubo's existing metrics endpoints (
/debug/metrics/prometheus) and document in changelog - Ensure metric works for both old and new provide systems for performance comparison
- @guillaumemichel : the new system metric corresponds to the sum of both old system counters
- Add metric to collab cluster grafana board to visualize average provide rate when provide system is working (for A/B test)
- provider: display stats for new provide system #10900
RPC/CLI Command Updates
TBD, provisionally we want to move everything related to provide/reprovide under ipfs provide namespace.
- Update
ipfs routing provide|reprovidecommands to work with new system (if possible/feasible, if not, return informative error to use new commands inipfs provide) - Similar for
ipfs stats provideandipfs stats reprovide(wire up, or update to return error until properly wired up for new system) - Ensure
ipfs provide clearworks correctly with new system, and that queue is automatically purged when - Direct users to use modern
ipfs providecommands (update--helpof deprected commands)
Configuration
- Add configuration option in Kubo config (opt-in initially)
- Update
docs/config.md - Include migration guidance and performance comparison information (anectodal A/B from collab cluster) in changelog
- Document breaking changes and command deprecations in changelog
Testing Requirements
End-to-End Regression Tests
TBD, we may not have tests for different Reprovider.Strategy. If not, we need to add them, to catch regression. Avoid Sharness if possible, prefer E2E in go tests in test/cli
- Test all existing
Reprovider.Strategyoptions continue to work- With defaults (old backend)
- With new provide system (opt-in to new backend)
Performance Validation
- Kubo PR with boxo and kad-dht + config wired up
- Deploy staging image to 2 collab cluster boxes for A/B, opt-in provide on one of them
- write down results on the Kubo PR with integration / changelog
Breaking Changes
- Provider Queue Reset: Switching to/from the new reprovide sweep system will force
ipfs provide clearto ensure provider queues are reset - Command Deprecation: when opt-in
ipfs routing provide|reprovidecommands may return errors directing users to modernipfs providecommands when using the new system (TBD, maybe we can keep them working) - Stats Commands: when opt-in
ipfs stats provideandipfs stats reprovidewill return errors when using the new system until properly implemented (TBD, maybe we can keep them working, but we should mark them as deprecated) - Opt-in Required: Existing users will continue using the old system; new system requires explicit configuration change
Success Criteria
- new provide system available as opt-in configuration option
- existing functionality preserved for users not opting in
- metrics show improved provide performance (higher average provide rate / shorter provide window)
- end-to-end tests pass for
- all reprovider strategies with old backend
- all reprovider strategies with new backend
- new provide system stats RPC/CLI wired up in
ipfs stats provide|reprovideipfs provide stat - new provide system RPC/CLI for manual
provide <cid>/reprovide - documentation and changelog
Related Issues and PRs
- Source PRs
- feat: Reprovide Sweep libp2p/go-libp2p-kad-dht#1082 (Reprovide Sweep implementation)
- Reprovide Sweep libp2p/go-libp2p-kad-dht#1095 (Reprovide Sweep review)
- Dependencies
- Remove providing Exchange. Call Provide() from relevant places. boxo#976 (boxo integration)
- Provide newly received blocks according to reprovide strategy #10837 (MFS provides issue)
- Background