-
Notifications
You must be signed in to change notification settings - Fork 1
Revert HOLD preference; instead don't trust reweighting negative nodes #87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks safe to me: I think this message resolution order was possible before, where pings to will-have-negative-weight nodes could resolve before whatever reweight becomes in-progress, and so we’re now just guaranteeing that that’s the effective message outcome all the time. Whether that fixes things / doesn’t cause more problems is less clear to me, but I’m optimistic.
|
From offline discussion
|
I really thought I did something in #86 but turns out I was wrong.
The situation I observed was not simply due to an unfortunately timed reweight + rewind -- it was only possible due to a very unlucky combination of stale information + simultaneous reweights.
Revisiting the offending problem graph -- the only reason that 11241 is able to make it to its reweight critical section is that the pong from 11226 that contributed to its reweight was so stale that 11226 was still unweighted and unmatched at the time the pong was sent:
This stale pong
AUGMENT 2 #<11241>--->#<11226> from #<11226>then happens to have the same weight as the soft pong that is returned to 11241 during CHECK-REWEIGHT, as 11226 is in the middle of a CONTRACT 2 reweight and has been de-weighted back to 0:As such, we get a
HOLD 2 #<11241>--->#<11226> from #<14111>pong and think everything is fine because the lowest-weight rec is still 2, so we clear ourselves to reweight. And then, 11241 is a solo node (whereas 11226 is part of a 3-tree), it is able to reweight, re-check (deciding that everything is once again fine), and finalize the reweight before 11226's tree realizes it has reweighted too much and needs to rewind, resulting in a negative-weight edge between 11241 and 11226.Clearly this is very rare, but of course that has never stopped me, so after much consternation and deep thought I realized that we basically just need a way to say "don't trust mid-reweight pongs from negative nodes." However, making them non-pingable is a recipe for deadlock, and so what I came up with is having the negative nodes "pretend" that they actually havent reweighted until the operation is fully finalized. We do this by "stashing" their original weights in a slot before the reweight happens and returning this value when pinged. This value is then "unstashed" at the end of the reweight operation.
NB: This basically reverts all the material changes in #86 (only the logging improvements remain).