Skip to content

Conversation

@phillxnet
Copy link
Member

@phillxnet phillxnet commented Aug 7, 2017

Prior to this pr scrub_status() reported only 'unknow', 'running', and 'finished'. The scrub halted (interrupted) state was misreported as running which lead to buggy behaviour. This pr improves scrub status reporting resolution to address the consequent buggy behaviour and improve user experience re scrub status.

Elements of this pull request:

  • Fix portability bug in fs unit tests. Fixes fix portability bug in fs unit tests #1787
  • Add unit tests to account for current expected behaviour of scrub_status().
  • Add unit test to indicate current suspected false positive for 'running' status on 'interrupted' scrubs.
  • improve scrub status reporting resolution to include halted (interrupted), conn-reset (Connection reset by peer), and cancelled (aborted).
  • Pool scrub job monitoring was also extended to include 'halted' and 'cancelled' as 'non running' jobs.

Fixes #1786

And as #1786 was essentially a duplicate of #1640 (with more specificity) we also have:
Fixes #1640

Note that this pr also includes a trivial fix for:
issue #1787 "fix portability bug in fs unit tests" so adding the also fixes:
Fixes #1787

./bin/test --settings=test-settings -v 3 -p test_btrfs*

results in:

...
test_scrub_status_cancelled (fs.tests.test_btrfs.BTRFSTests) ... ok
test_scrub_status_conn_reset (fs.tests.test_btrfs.BTRFSTests) ... ok
test_scrub_status_finished (fs.tests.test_btrfs.BTRFSTests) ... ok
test_scrub_status_halted (fs.tests.test_btrfs.BTRFSTests) ... ok
test_scrub_status_running (fs.tests.test_btrfs.BTRFSTests) ... ok
...

All scrub states were reproduced on real hardware and their UI reflection confirmed. Note that to reproduce the halted ('interrupted') state the test machine was shutdown 'mid scrub'. Upon next boot the halted ('interrupted') state was in effect. And to reproduce 'conn_reset' / time out warning state a second scrub with the -f option was enacted over and existing running one. This caused interactivity issues and upon the first scrub finishing the conn_reset status was observed on an active (and waiting) btrfs scrub status command.

A trivial amount of code duplication was created but was highlighted with a "TODO:" and is very local in nature: within the same if-else block.

Ready for review.

Not related to indicated issue but fixed as
additional tests are to be added as part of
rockstor#1786 and all tests should pass prior to that
work.
Includes the following states: running, finished,
halted (interrupted), conn_reset (Connection reset
by peer / no stats available), and cancelled
(aborted).
Prior reported states were running, finished,
and unknown. This caused buggy behaviour as
halted (interrupted) was mis-reported as running.
Add 'Connection reset by peer', halted (interrupted),
cancelled (aborted).
Pool scrub job monitoring was also extended to
include 'halted' and 'cancelled' as 'non running'
jobs.
Code comments added.
out, err, rc = run_command([BTRFS, 'scrub', 'status', '-R', mnt_pt])
if (len(out) > 1):
if (re.search('running', out[1]) is not None):
if err != [''] and len(err) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this line reads as if len(err) != 0 and len(err) > 0
can't this just be if len(err) > 0:?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll revisit this soon as I have a few tidy/checkup's/todo's pending now and have in mind a suitable issue against which I can do them. But I think it's right as your analogy would hold if it were err != [] as len([]) = 0, however len(['']) = 1 and I think I was after:

if not an empty string in list (specifically) and at least a single element (to ensure a safe 0 index).

But yes given we then check the exact contents of err[0] directly there after the first if clause is redundant. However given it's the most common return (err=[''] is the norm) it's a rapid and cheap first check to avoid all others: I had imagined we would have occasion to add additional inner ifs than the one I added there.

I can at least clarify with comments on the why.

Cheers.

@schakrava
Copy link
Member

Thank you @phillxnet , this is nice. I added a small comment, which is minor and you may address in a future pull request.

@schakrava schakrava merged commit 32ed98c into rockstor:master Aug 8, 2017
@phillxnet phillxnet deleted the 1786_improve_scrub_status_reporting_resolution branch August 9, 2017 11:31
@phillxnet
Copy link
Member Author

@schakrava Thanks for the review and Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix portability bug in fs unit tests improve scrub status reporting resolution SCRUB fails to end / fails to restart after shutdown

2 participants