Motivation and use case
In this post I will explain how to automate the process of backuping a postgres database table into a GoogleDrive cloud storage location. I encoutered this problem when helping my dad out with some IT related problem, which included the backup process of a Postgress SQL database table. The solution I provide here can, however, easily be adapted to backup whatever and upload it into some GoogleDrive folder.
In my approach, we use two scripts to accomplish this: One script to produce the backup files and a second script that takes care of the uploading process. If you wish to upload something else, simply adapt the shell script which, in this case, generate the .sql dumps of the desired postgres database table.
For now, clone the repository and we are good to go:
git clone https://github.com/frietz58/postgres_googledrive_backup.git
If just want to get started, the README of this project on Github contains all the relevant steps aswell ;)
Note that the installation process requires a desktop environment and a browser, so straight up installing this on a headless linux distro won’t work, for reasons explained below .
Dumping a postgres table
First, we will take a look at how we can backup a postgres database table (If you wish to backup something else, you should start here). For this, we use the command
pg_dump [dbname], which can create script or archive dumps of any given database. I’ve chosen to use script dumps which are
"plain-text files containing the SQL commands required to reconstruct the database to the state it was in at the time it was saved." postgres documentation
Specifically, in the cron_backup.sh script, the line that produces the .sql dumb file is:
pg_dump $target_db > $save_dir/$timestamp"_dump_"$ip4_addr".sql"
This create the .sql dump of
$target_db in the folder
$save_dir, where the name of the dump file contains the current timestamp
$timestamp and the IPv4 adress
$ip4_addr of the machine, on which the backup has been created. Regarding the postgres database, you need to consider that by default, not everyone user is allowed to access the database tables. If you wish to only create a backup once, right now, you could use the following command:
sudo -i -u postgres pg_dump $target_db > dump.sql
The rest of the cron_backup.sh script parses the .yaml configuration file, sets varable like
$timestamp and echo some log messages. If you don’t neet dynamic file names and can assure that the
$save_dir always exists, you could drastically shorten the script :)
Uploading to GoogleDrive
The second scripts
drive_upload.py takes care of uploading the files into a GoogleDrive cloud folder. It implements the GoogleDrive v3 API. Before that works, you will have to install the requirements, preferably in a virtualenv. So create and activate the virtualenv:
virtualenv -p python3 postgres_venv
pip install -r postgres_googledrive_backup/requirements.txt
Once you’ve installed the requirements, you need to enable the drive api for you Google account. I recommend that you create a new Google account specifically for storing the backups. Which is what we will do in the next section.
Enabling the Drive v3 API
Before we can automate the upload process, we must enable the Google Drive v3 API and in the process aquire the two files
token.pickle. Those files are needed by the
drive_upload.py script to authenticate the connected Google account using the OAuth v2.
Aquiring credentials.json file
Enabling the drive api (under step one) on this page, will download the file
credentials.json. For now you can save this file wherever, but make sure to set the path in the
credtials_path: /path/to/credentials.json yaml, so that the
drive_upload.py script will know where the file is located. Once that is done, we can finally obtain the token.pickle file, which is the last piece that is still missing.
Acquiring token.pickle file
Even though I am not very happy about, the last step required you to have a desktop environment with a working browser. I am not sure why Google made it this way, but I assume that they want that you can only directly, in an interactive manner give a script or app access to your drive storage.
Given that you have set the path to the
credentials.yaml file in the
config.yaml file, execute the
python drive_upload.py -cf /path/to/config.yaml
This will open your default browser and ask you to allow “Quickstart” access to that Google accounts drive storage:
If everything worked, this will create the file
token.pickle in your current working directory and output
Token file has been acquired, exiting... to your console. Make sure to adjust to the path of the
token.pickle file in the
config.yaml file, so that
drive_upload.py finds the token, independt of your working directory.
And that’s it. You could now manually run both scripts manually and it would upload the .sql dumbs into the google drive folder set in the
config.file. But we are developers, there is no fun in doing things manually. So let’s automate the entire workflow using the crontab in the next and final section of this post.
Automatic execution via Crontab
Now that the files
token.pickle have been acquired, we can also use this on any headles linux distro. Simply
scp the entire
postgres_googledrive_backup folder to your headless linux distro, create a virtualenv there and install the requirements once again.
The important thing is to make sure that the paths in the
config.yaml file are correct.
To automatically create backups using the
cron_backup.sh script and automatically uploading those into the cloud with the
drive_upload.py script, we will use the crontab. The crontab allows us to execute arbitrary commands on a regular schedule, meaning that we can automatically create backups and upload them into the cloud.
Because only the postgres user has access to the postgres database tables that we wish to backup, we must use the postgres users crontab to schedule the backups. I assume that you have postgres installed, given that this post deals with backuping a postgres database table… We can use the
su command to become the postgres user like this:
sudo su postgres
Once we are the postgres user, we can edit the crontab using the following command:
50 23 * * * /home/pcadmin/automatic_backup/cron_backup.sh /home/pcadmin/automatic_backup/config.yaml > /tmp/backup.log
55 23 * * * /home/pcadmin/backup_venv/bin/python3 /home/pcadmin/automatic_backup/drive_upload.py -cf /home/pcadmin/automatic_backup/config.yaml >> /tmp/backup.log
If you wish to run the backup process daily at a given specific time, the cron prefix is
23 50 * * * for every day at 23:50. Take a look at the crontab.md file, which is pretty close to what I am running on my linux server. Both scripts except one paramaeter, which is the path to the
config.yaml file. So make sure to adjust that path and also make sure that the paths in the
config.yaml are also correct.
When working the crontab, environment variables are not set, meaning that something
python example.py won’t work, instead you need to use absolute paths:
Further, we need to make sure that postgres user, who will execute the commands in the postgres crontab, has read, write and execute permissions at the directory containing our scripts, so run the following commands:
sudo chown postgres postgres_googledrive_backup
sudo chmod 700 postgres_googledrive_backup
This makes the postgres user the owner of that directory and gives only the owner read write and execute permisions.
And that’s it. You can now, at an arbitrary interval, create backups of a postgres database table (or anything else if you adapt the backup generating script) and upload those backups into a GoogleDrive cloud folder. Working with the crontab for the first time can be a bit cumbersome, because commands that work in your shell don’t necisarly work in the crontab. Make sure that all paths are correct and the user whose crontab you are using (here the postgres user) has sufficient permissions on the folder its operating on, and you should be good to go.