-
Notifications
You must be signed in to change notification settings - Fork 592
Description
Describe the feature you'd like to have
Add DH-HMAC-CHAP (Diffie-Hellman Challenge Handshake Authentication Protocol) authentication support to the Ceph NVMe-oF CSI driver, enabling secure authentication between Kubernetes nodes and NVMe-oF storage subsystems.
This feature is required by the PM.
What is DH-CHAP?
DH-CHAP is a security protocol defined in the NVMe specification that authenticates connections between hosts (initiators) and storage controllers. Think of it as requiring a password before allowing access to storage, but cryptographically secure.
Key Concepts
- Host (Initiator): The Kubernetes worker node that wants to access storage
- Subsystem (Target/Controller): The storage system exposing NVMe volumes through the gateway
- DH-CHAP Key: A cryptographic secret used to prove identity during authentication
- Bidirectional Authentication: Both the host and storage controller prove their identities to each other (extremely recommended for production)
- Unidirectional Authentication: Only the host proves its identity to the storage controller
What is the value to the end user? (why is it a priority?)
Security Benefits
-
Multi-Tenant Isolation: In shared Kubernetes clusters, DH-CHAP ensures that workloads from one tenant cannot access storage belonging to another tenant, even if they share the same physical network.
-
Man-in-the-Middle Protection: Bidirectional authentication ensures that hosts connect to legitimate storage controllers, not malicious imposters on the network.
How will we know we have a good solution? (acceptance criteria)
Functional Requirements
1. StorageClass Configuration
- Support
enable_dhchap: "true/false"parameter to enable/disable authentication - Support
dhchap_mode: "bidirectional/unidirectional"parameter to select authentication mode - Backward compatible: existing StorageClasses without DH-CHAP continue to work
2. Key Management
- Automatically generate cryptographically secure DH-CHAP keys
- Store keys securely in Kubernetes Secrets
- One unique key per node-subsystem connection (isolation)
- One subsystem key per subsystem (for bidirectional mode)
- Automatic key cleanup when resources are deleted
3. Volume Lifecycle Operations (when enable_dhchap: "true")
- CreateVolume: Generate subsystem key (if bidirectional mode) and create subsystem with authentication
- ControllerPublishVolume: Generate/retrieve host key and add host to gateway with authentication
- NodeStageVolume: Retrieve keys from secrets and connect with
nvme connect --dhchap-secretcommand - ControllerUnpublishVolume: Remove host and delete host key when last namespace detached
- DeleteVolume: Delete subsystem key when last namespace deleted
4. Authentication Modes
- Bidirectional: Host and subsystem both authenticate (mutual authentication)
- Unidirectional: Only host authenticates to subsystem
- None: No authentication (backward compatibility)
5. Error Handling
- Clear error messages when authentication fails
- Graceful handling of missing or invalid keys
Non-Functional Requirements
6. Security
- Keys generated using cryptographically secure random number generators
- Keys never logged or exposed in error messages
- Secrets use restrictive RBAC permissions
- Different keys for different connections (no key reuse across subsystems)
7. Compatibility
- Compatible with Linux kernel NVMe driver (nvme-tcp module)
- No breaking changes to existing CSI API
Additional context
1. NOTE: there is an option to update the dh-chap key for host\subsystem. This option is not presented here. We need to talk about it how we want to handle the updating option. it is not part in this phase.
Architecture Highlights
Key Design Decisions
-
One Key Per Node-Subsystem Connection: Each unique node-subsystem pair gets its own DH-CHAP key. This provides:
- Better security isolation (compromised key only affects one connection)
- Simple cleanup logic (delete key when connection removed)
- Per-subsystem access control
-
Key Persistence: Keys are stored in Kubernetes Secrets and persist across pod restarts, ensuring consistent authentication without regenerating keys unnecessarily.
Implementation References
- Ceph NVMe-oF Gateway: Supports DH-CHAP via
subsystem add --dhchap-keyandhost add --dhchap-keycommands - Linux nvme-cli: Supports DH-CHAP via
nvme connectwith--dhchap-secretand--dhchap-ctrl-secretoptions
Example StorageClass Configuration
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-nvmeof-secure
provisioner: nvmeof.csi.ceph.com
parameters:
# Existing parameters
subsystemNQN: "nqn.2024-01.io.ceph:csi"
nvmeofGatewayAddress: "192.168.1.100"
nvmeofGatewayPort: "5500"
listeners: '[{"address":"192.168.1.100","port":4420,"hostname":"gw1"}]'
# NEW: DH-CHAP authentication
enable_dhchap: "true"
dhchap_mode: "bidirectional" # Recommended for productionSecurity Considerations
- Key Rotation: Future enhancement to support periodic key rotation using gateway's
change_host_keyandchange_subsystem_keyAPIs - Key Storage: Consider integrating with external key management systems (HashiCorp Vault, KMS??) for enhanced key protection
- Audit Logging: Authentication events should be logged for security monitoring and compliance