- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.1k
Revert "integration-playbook: Idempotent Swapping" #1058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This reverts commit b198c57. Signed-off-by: Mrunal Patel <[email protected]>
| /test ami | 
| LGTM if makes tests happy again, great Dan! | 
| LGTM, fingers crossed for the tests. | 
| @rhatdan @mrunalp @rhatdan Great find, but I wouldn't consider this a long-term fix. As discussed on IRC - this was originally added to prevent an OOM during cri-o compiling. While that may have been subsequently resolved by jamming more memory onto the VMs, I think it's a mistake to assume test-subject swap is not needed for every test. Also, if a test-subject already has swap setup, this change will not resolve the issue (the playbook does nothing in that case). A better long-term solution would be to fix the few (one) specific test that needs to control it's memory placement. i.e. Lock the pages into pages into main memory, preventing them from being swapped. IIRC, there's a command or systemcall that's used to do this...can't remember what it's called at the moment....@mrunalp and I will keep looking for it... | 
| @rhatdan @mrunalp found it...this is probably what the test should do: 
 This way avoids putting the cart before the horse (for the benefit of a single test). It also better exercises this situation under real-world conditions (the vast majority of customer setups I've seen, use swap). Perhaps this test is even playing in the realm of the OOMKiller firing off against something important - since the test cannot predict what other consumers are doing. An unlikely situation to be sure, but one that creates an even harder to debug situation. | 
| Ohh...another thing.  At the end, the test could check it's  | 
| I checked that the rhel ami job has passed the oom test :) | 
| 
 Our limit is just under 5M. This test is for the container not wreaking havoc on the entire system :) 
 No real world go programs typically call this. | 
| 🎉 | 
| 
 Oh good, that's very reasonable. Hmmm, you sure there's not a systemd/kernel bug here? I would think that a limit should be honored whether or not swap is available. Otherwise, what's the point in having a limit at all? If there is a bug, ignoring it in the "framework" is the most difficult way to silence it, just ask @edsantiago. For things like this, it's way easier to either fix the underlying problem, or skip the test. 
 Tests != real-world, ever, and by definition (though sometimes they simulate it). ...wait...doesn't that mean if reality is a simulation, then, like, maybe, that, then we could also, like, be in, like, some universal unittest...whoa...deep... | 
| Re: cgroup bug: From the kernel docs, though I don't fully parse all of what it says: 
 So, to me that reads like swap limiting needs to be handled specially by the application when setting up these limits for the cgroup(s) of a child.  This is also the likely reason why docker has separate  Without other details, (I dunno the original situation) this sounds like a bug that's being uncovered by the OOM-test failure, and something that absolutely should NOT be overlooked/ignored. Though I also reserve the right to be dead wrong, it would be nice is someone tried to convince me 😀 | 
| @mrunalp thx for the chat. So to summarize, this isn't a cri-o bug because... 
 This PR is just a quick-fix to get CRI-O out the door, later we can consider bringing it back in and/or fixing the test so swap doesn't get in it's way. | 
| @runcom @mrunalp Ugg, okay, just realized, this isn't actually going to fix the problem because...you've reverted one implementation of swap-enabling for an older (but worse) one. Meaning: Swap is still turned on. It's probably just accidentally not working, because of Origin-CI image caching...I've got an idea for a better fix... | 
| ...opened #1063 | 
This reverts commit b198c57.
Signed-off-by: Mrunal Patel [email protected]