- 
                Notifications
    
You must be signed in to change notification settings  - Fork 1.1k
 
[OCPBUGS-58229]: server: Fix network cleanup failures when NetNS path is empty #9410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OCPBUGS-58229]: server: Fix network cleanup failures when NetNS path is empty #9410
Conversation
e9cd871    to
    a662b7c      
    Compare
  
    
          Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@            Coverage Diff             @@
##             main    #9410      +/-   ##
==========================================
- Coverage   67.24%   67.06%   -0.18%     
==========================================
  Files         202      202              
  Lines       28025    28043      +18     
==========================================
- Hits        18845    18807      -38     
- Misses       7602     7662      +60     
+ Partials     1578     1574       -4     🚀 New features to boost your workflow:
  | 
    
a662b7c    to
    082abb6      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM, do we need an additional integration test for that?
/hold
082abb6    to
    2d476be      
    Compare
  
    4ef1089    to
    0a6a3d4      
    Compare
  
    ba9c580    to
    1b98d6f      
    Compare
  
    | 
           I validated that upgrading CRI-O to the latest build resolves the issue with the help of this PR: Steps Performed: 
 Output: The pod was successfully created in the test-ai namespace after reboot.  | 
    
| 
           /lgtm  | 
    
| 
           @asahay19: changing LGTM is restricted to collaborators In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.  | 
    
| 
           /hold cancel  | 
    
| 
           CI is broken due to the known issues with ansible snippet and the container-selinux change.  | 
    
| 
           /approve LGTM, @cri-o/cri-o-maintainers PTAL  | 
    
| 
           [APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: haircommander, saschagrunert, sohankunkerkar The full list of commands accepted by this bot can be found here. The pull request process is described here 
Needs approval from an approver in each of these files:
 
      
 Approvers can indicate their approval by writing   | 
    
Fixes an issue where empty network namespace paths cause CNI teardown failures, preventing pod deletion and creating stuck pods. This happens when infra containers die, leaving NetNsPath() returning empty strings that CNI plugins cannot handle. The failure cascade: dead infra container → empty NetNS → CNI failure → StopPodSandbox failure → stuck pod → systemd cgroup conflicts → new pod creation failures. Observed in MicroShift environments. Signed-off-by: Sohan Kunkerkar <[email protected]>
1b98d6f    to
    02cd675      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes network cleanup failures that occur when the network namespace path is empty, which prevents pod deletion and can create stuck pods in production environments. The issue arises when infra containers die, leaving NetNsPath() returning empty strings that CNI plugins cannot handle properly.
- Added early detection and handling for empty network namespace paths during network teardown
 - Modified test to expect network as already stopped when sandbox is stopped
 - Added comprehensive integration test for pod deletion with missing NetNS scenarios
 
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description | 
|---|---|
| server/sandbox_network.go | Adds logic to handle empty NetNS paths gracefully during network cleanup | 
| server/sandbox_stop_test.go | Updates test expectations and removes unused import | 
| test/network.bats | Adds integration test for pod deletion when NetNS path is missing | 
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| . "github.com/onsi/ginkgo/v2" | ||
| . "github.com/onsi/gomega" | ||
| "go.uber.org/mock/gomock" | ||
| types "k8s.io/cri-api/pkg/apis/runtime/v1" | ||
| ) | 
    
      
    
      Copilot
AI
    
    
    
      Sep 17, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of the gomock import appears incomplete. The removed test case that used gomock.InOrder() and gomock.Any() suggests this import was necessary for proper mock testing. Consider whether this test removal reduces coverage for error handling scenarios.
| log.Debugf(ctx, "CNI teardown failed due to missing/invalid network namespace for pod sandbox %s(%s): %v", sb.Name(), sb.ID(), err) | ||
| 
               | 
          ||
| // Clean up CNI result files even when NetNS is invalid. | ||
| s.cleanupCNIResultFiles(ctx, sb.ID()) | 
    
      
    
      Copilot
AI
    
    
    
      Sep 17, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cleanupCNIResultFiles method is called but there's no documentation or context about what this method does or whether it can fail. Consider adding a comment explaining its purpose and whether its failure should be handled.
| 
           /retest  | 
    
| 
           /retest-required  | 
    
| 
           /lgtm  | 
    
| 
           /hold cancel  | 
    
| 
           /cherry-pick release-1.34  | 
    
| 
           /cherry-pick release-1.33  | 
    
| 
           @sohankunkerkar: new pull request created: #9471 In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.  | 
    
| 
           @sohankunkerkar: new pull request created: #9472 In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.  | 
    
Fixes an issue where empty network namespace paths cause CNI teardown failures, preventing pod deletion and creating stuck pods. This happens when infra containers die, leaving NetNsPath() returning empty strings that CNI plugins cannot handle.
The failure cascade: dead infra container → empty NetNS → CNI failure → StopPodSandbox failure → stuck pod → systemd cgroup conflicts → new pod creation failures. Observed in MicroShift production environments.
What type of PR is this?
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes:
Special notes for your reviewer:
The network cleanup failure is blocking pod deletion, which is causing the systemd transaction conflicts.
Does this PR introduce a user-facing change?