Skip to content

Conversation

@kcalvinalvin
Copy link
Collaborator

Change Description

Currently btcd keeps downloading blocks and fails to verify them if the disk is out of space. We exit immediately if we detect that the disk is out of space to ensure the database is at a recoverable state later on.

Steps to Test

Tested by using a macbook with full disk and checking if the program shuts down as expected. Verified that it's also able to recover when given enough storage again.

Pull Request Checklist

Testing

  • Your PR passes all CI checks.
  • Tests covering the positive and negative (error paths) are included.
  • Bug fixes contain tests triggering the bug to prevent regressions.

Code Style and Documentation

📝 Please see our Contribution Guidelines for further guidance.

@coveralls
Copy link

coveralls commented Apr 14, 2025

Pull Request Test Coverage Report for Build 14848505876

Details

  • 2 of 24 (8.33%) changed or added relevant lines in 2 files are covered.
  • 6 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.1%) to 56.743%

Changes Missing Coverage Covered Lines Changed/Added Lines %
database/ffldb/blockio.go 2 12 16.67%
database/ffldb/dbcache.go 0 12 0.0%
Files with Coverage Reduction New Missed Lines %
peer/peer.go 6 74.23%
Totals Coverage Status
Change from base Build 14316804349: 0.1%
Covered Lines: 31143
Relevant Lines: 54884

💛 - Coveralls

if errno, ok := pathErr.Err.(syscall.Errno); ok &&
errno == syscall.ENOSPC {

log.Errorf("%v. Cannot save any more blocks "+
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also mention it's due to the disk being full so the operator can act accordingly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest commit

@saubyk saubyk added this to the v0.25 milestone Apr 14, 2025
@kcalvinalvin kcalvinalvin force-pushed the 2025-02-17-exit-when-running-out-of-disk-space branch from 941bb30 to 8ecdf96 Compare April 21, 2025 11:02
@saubyk saubyk requested a review from yyforyongyu April 21, 2025 23:39
Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM🌺

Currently btcd keeps downloading blocks and fails to verify them if the
disk is out of space. We exit immediately if we detect that the disk is
out of space to ensure the database is at a recoverable state later on.
@kcalvinalvin kcalvinalvin force-pushed the 2025-02-17-exit-when-running-out-of-disk-space branch from 8ecdf96 to cabd365 Compare May 5, 2025 23:43
@kcalvinalvin kcalvinalvin requested a review from starius May 5, 2025 23:43
Copy link
Contributor

@starius starius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
I tested this in a small filesystem. Works as expected!

@guggero
Copy link
Collaborator

guggero commented May 6, 2025

Just a side note here: I'm not super familiar with the internals of btcd, so not sure if the same applies here. But in lnd we avoid using os.Exit() because that prevents any defer cleanup statements to run (e.g. closing network or database connections, flushing stuff to disk, closing files and so on).

That's why we're using a ShutdownLogger instead, which causes the main Goroutine to initiate a clean shutdown whenever a message with the level Critical is logged.
Might be worth considering here.

@kcalvinalvin
Copy link
Collaborator Author

Just a side note here: I'm not super familiar with the internals of btcd, so not sure if the same applies here. But in lnd we avoid using os.Exit() because that prevents any defer cleanup statements to run (e.g. closing network or database connections, flushing stuff to disk, closing files and so on).

That's why we're using a ShutdownLogger instead, which causes the main Goroutine to initiate a clean shutdown whenever a message with the level Critical is logged. Might be worth considering here.

btcd has one database that handles everything and the flush happens here:

btcd/btcd.go

Lines 133 to 137 in cd05d9a

defer func() {
// Ensure the database is sync'd and closed on shutdown.
btcdLog.Infof("Gracefully shutting down the database...")
db.Close()
}()

That's a fair point and I did think about returning an error and letting the caller handle the out of disk error but ultimately thought this was the best way to handle things. The database only ever writes to the disk at the places this code change takes place and if that's full, there's nothing else to flush for the caller because anything that the caller tries to flush follows the same code path.

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 📮

@Roasbeef Roasbeef merged commit 1eb974a into btcsuite:master May 6, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants