Skip to content

Conversation

max-wittig
Copy link
Contributor

@max-wittig max-wittig commented May 5, 2017

Wiki page: S3-Backup

@max-wittig max-wittig force-pushed the dev/s3-backup branch 2 times, most recently from 8ff6953 to 01448f7 Compare May 5, 2017 10:17
@max-wittig
Copy link
Contributor Author

Successfully tested on dev-fossology

@max-wittig max-wittig changed the title WIP: chore(backup): add s3 backup script chore(backup): add s3 backup script May 5, 2017
@max-wittig max-wittig force-pushed the dev/s3-backup branch 6 times, most recently from b976e33 to 7245350 Compare May 12, 2017 09:37
@max-wittig max-wittig force-pushed the dev/s3-backup branch 7 times, most recently from a4fbdd8 to 46143d5 Compare June 20, 2017 12:08
@max-wittig max-wittig force-pushed the dev/s3-backup branch 12 times, most recently from c8e9cd2 to 275bdbe Compare July 6, 2017 13:19
fi

if [[ ${1} = "-l" ]]; then
LOGGING=${2}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to call this variable LOG or LOGFILE

@maxhbr
Copy link
Member

maxhbr commented Jul 12, 2017

Please do not create wiki pages before the corresponding features are merged.

@max-wittig max-wittig force-pushed the dev/s3-backup branch 6 times, most recently from 2cfa4c3 to 7e0d014 Compare July 13, 2017 14:36
@max-wittig max-wittig force-pushed the dev/s3-backup branch 2 times, most recently from 8edf22a to cc4e2be Compare July 18, 2017 14:54
Copy link
Member

@maxhbr maxhbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments


echo Start database dump at `date`
#### postgresql needs to be restarted for the dump to work
/etc/init.d/postgresql restart
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just /etc/init.d/postgresql start.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it needs to be restarted for the dump to work. I don't think start would work

Copy link
Member

@maxhbr maxhbr Jul 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not know why this should be necessary (except for reloading the config, which you do not change)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe this forces the DB to write everything out...


#### sync repository
echo "Uploading and starting backup to S3..."
tar -zcvf - /srv/fossology/ | ${DIR}/venv/bin/aws --debug s3 cp --expected-size 200000000000 --sse AES256 - "s3://${BUCKET}/${FILENAME}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--sse AES256 is the default value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but maybe the default value will change overtime and then it wouldn't work anymore. What do you think?



echo Creating diff of repository at `date`
ls -RlahGg --time-style='+' /srv/fossology/repository/ | cut -d ' ' -f 2-5 | ${DIR}/venv/bin/aws s3 cp --sse AES256 - "s3://${BUCKET}/${FILENAME}-repo-diff.txt"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this recursive? This cut splits folders with spaces in their names in arbitrary places.
Why do you keep the inode-count?

Please add variables for magic strings like "s3://${BUCKET}/${FILENAME}-repo-diff.txt" which are used in multiple places.

Copy link
Contributor Author

@max-wittig max-wittig Jul 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

splits folders with spaces in their names in arbitrary places

No it just splits the column. There is a tab in there, not spaces.

Copy link
Member

@maxhbr maxhbr Jul 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are listing recursively, the ls -RlahGg --time-style='+' also lists the names of the corresponding folders as the first "word" in a line (for sub directories). If they contain spaces you are splitting them and keeping a part of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you happen to know a way to remove the inode-count, besides cut?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example:

$  tree ./                                                                                                                                                                                                                               /tmp/test
./
├── Folder with splaces
│   └── file
└── otherFile

and

$ ls -RlahGg --time-style='+' . | cut -d ' ' -f 2-5                                                                                                                                                                                     /tmp/test
.:
20K
 3 4,0K 
16  12K 
 2 4,0K 
 1  

with splaces:
8,0K
2 4,0K  .
3 4,0K  ..
1  

Note:

here you also see that the ls -RlahGg --time-style='+' contains on my system spaces and not tabs => the filenames are not listed.

Copy link
Contributor Author

@max-wittig max-wittig Jul 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right. Also we could also just use tree for the diff, right? Only filesize would be gone then. Or what do you think should be the best way to approach this?

Copy link
Member

@maxhbr maxhbr Jul 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want a list of all files together with their sizes use something like

find /srv/fossology/repository/ -type f -exec du -h {} \;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!


echo Checking diff of repository at `date`
DIFF=`${DIR}/venv/bin/aws s3 cp --sse AES256 s3://${BUCKET}/${FILENAME}-repo-diff.txt -`
CURRENT_DIFF=`ls -RlahGg --time-style='+' /srv/fossology/repository/ | cut -d ' ' -f 2-5`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are calling ls -RlahGg --time-style='+' /srv/fossology/repository/ at multiple places, please move them to an function.


echo Start removing old data at `date`

${DIR}/../../fo-cleanold --delete-repository --delete-database
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need to ensure that the database is running.
This might delete productive data, you should display a warning and give the option to exit.

# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
source ${DIR}/before
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are sourcing the before before setting set -x. You can't be sure that $FILENAME is filled and that apache is stopped

su postgres -c "psql -f ${TMP_FILE}"
echo --------- End database log ---------
else
su postgres -c "psql -f ${TMP_FILE}" > /dev/null 2>&1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you also suppress STDERR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because there are many errors that come up, because tables already exist etc...

#### postgresql needs to be restarted for the dump to work
/etc/init.d/postgresql restart
#### these steps are split, because of permissions
TMP_FILE=`mktemp`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already done in before

@max-wittig max-wittig force-pushed the dev/s3-backup branch 4 times, most recently from 034a6ff to ff3bb1a Compare July 26, 2017 06:24
@maxhbr maxhbr added the WIP label Jul 27, 2017
@max-wittig max-wittig force-pushed the dev/s3-backup branch 4 times, most recently from fc88e17 to e08f5bd Compare August 31, 2017 08:07
echo "Please specify a file, where to save the log file to"
exit 1
fi
else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation is confusing here

exit 1
fi
else
exec &> >(tee -a ${LOGGING})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you only call this command if no argument "-l" was passed. That looks wrong

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same in the other file

@mcjaeger
Copy link
Member

@max-wittig how about writing the status of this pull request as of now?

@max-wittig
Copy link
Contributor Author

The Pull Request is working, but it's not possible to run the restore and backup script at the same time.
There is no safety measure to ensure that a full backup is downloaded.

@mcjaeger mcjaeger merged commit 48b72a4 into fossology:master Sep 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants