Skip to content

Conversation

@fidencio
Copy link
Contributor

@fidencio fidencio commented Nov 6, 2020

What type of PR is this?

/kind bug

What this PR does / why we need it:

This patch fixes the situation of kubectl cp hanging forever.

Debugging this issue I've faced a quite interesting situatuon where after closeIOChan is closed, the CloseStdin functions "resumes" while closeIOChan is nil, making the function to sit there and wait forever.

Honestly, this bugs my mind quite a bit as I was under the impression that all the everyone waiting for the channel to be closed would be notified, would consume its content, and only after that it'd be nullified. Well, Today I learnt.

The simplest and more future proof way to work this issue around is to ensure closeIOChan is not nil inside CloseStdin's function.

Thanks to @liubin, @haircommander, and @r4f4 for being available to discuss the issue.

Which issue(s) this PR fixes:

Fixes #4353

Special notes for your reviewer:

Does this PR introduce a user-facing change?

None

@openshift-ci-robot openshift-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Nov 6, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 6, 2020
@fidencio
Copy link
Contributor Author

fidencio commented Nov 6, 2020

/area-vm

@fidencio fidencio force-pushed the wip/runtime-vm-fix-kubectl-cp-regression branch 2 times, most recently from 832d7e9 to 7c384c0 Compare November 6, 2020 22:48
This patch fixes the situation of `kubectl cp` hanging forever.

Debugging this issue I've faced a quite interesting situatuon where
after `closeIOChan` is closed, the `CloseStdin` functions "resumes"
while `closeIOChan` is **nil**, making the function to sit there and
wait forever.

Honestly, this bugs my mind quite a bit as I was under the impression
that all the everyone waiting for the channel to be closed would be
notified, would consume its content, and only after that it'd be
nullified.  Well, Today I learnt.

The simplest and more future proof way to work this issue around is to
ensure `closeIOChan` is not **nil** inside CloseStdin's function.

Thanks to @liubin, @haircommander, and @r4f4 for being available to
discuss the issue.

Fixes: cri-o#4353

Signed-off-by: Fabiano Fidêncio <[email protected]>
@fidencio fidencio force-pushed the wip/runtime-vm-fix-kubectl-cp-regression branch from 7c384c0 to 6e897b8 Compare November 6, 2020 22:50
@fidencio
Copy link
Contributor Author

fidencio commented Nov 6, 2020

@haircommander, updated in order to keep the defer() function. Thanks for the suggestion.

@codecov
Copy link

codecov bot commented Nov 6, 2020

Codecov Report

Merging #4354 (6e897b8) into master (943f033) will not change coverage.
The diff coverage is 0.00%.

@@           Coverage Diff           @@
##           master    #4354   +/-   ##
=======================================
  Coverage   39.36%   39.36%           
=======================================
  Files         112      112           
  Lines        8825     8825           
=======================================
  Hits         3474     3474           
  Misses       4965     4965           
  Partials      386      386           

@fidencio fidencio changed the title runtime_vm: Don't nullify closeIOChan channel runtime_vm: Ensure closeIOChan is not nil inside CloseStdin's function … Nov 7, 2020
Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fidencio, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [fidencio,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fidencio
Copy link
Contributor Author

fidencio commented Nov 7, 2020

/retest

2 similar comments
@fidencio
Copy link
Contributor Author

fidencio commented Nov 7, 2020

/retest

@fidencio
Copy link
Contributor Author

fidencio commented Nov 7, 2020

/retest

@fidencio
Copy link
Contributor Author

fidencio commented Nov 7, 2020

/test e2e-aws

@fidencio
Copy link
Contributor Author

fidencio commented Nov 8, 2020

/retest

2 similar comments
@fidencio
Copy link
Contributor Author

fidencio commented Nov 9, 2020

/retest

@fidencio
Copy link
Contributor Author

fidencio commented Nov 9, 2020

/retest

@haircommander
Copy link
Member

/retest
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 9, 2020
@TomSweeneyRedHat
Copy link
Contributor

LGTM

@fidencio
Copy link
Contributor Author

fidencio commented Nov 9, 2020

/retest

1 similar comment
@fidencio
Copy link
Contributor Author

fidencio commented Nov 9, 2020

/retest

@openshift-merge-robot openshift-merge-robot merged commit 8e23406 into cri-o:master Nov 9, 2020
@fidencio
Copy link
Contributor Author

fidencio commented Nov 9, 2020

/cherry-pick release-1.19

@fidencio
Copy link
Contributor Author

fidencio commented Nov 9, 2020

/cherry-pick release-1.18

@openshift-cherrypick-robot

@fidencio: new pull request created: #4359

Details

In response to this:

/cherry-pick release-1.19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@fidencio: new pull request created: #4360

Details

In response to this:

/cherry-pick release-1.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fidencio fidencio deleted the wip/runtime-vm-fix-kubectl-cp-regression branch November 9, 2020 21:20
@openshift-merge-robot
Copy link
Contributor

@fidencio: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/openshift-jenkins/e2e_features_fedora 6e897b8 link /test e2e_features_fedora
ci/openshift-jenkins/integration_fedora 6e897b8 link /test integration_fedora
ci/openshift-jenkins/integration_crun 6e897b8 link /test integration_crun
ci/prow/e2e-aws 6e897b8 link /test e2e-aws

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

runtime_vm: kubectl cp ... may hang forever

7 participants