Jul 05

Automating Dreamhost backups

We here at Curious System Solutions use dreamhost as our hosting provider.  One of the nice things they give us is a nice, tidy, backup every month, if we ask for it.  It may take a few days if you ask at the beginning of the month, and it is easy to forget to download.  So, we have a handy python script that will check an imap4 email server to see if the backup is ready, and if so, download it.  The script is designed to be a cron job that can be ran every night so you don’t have to worry about remembering to download things.

The script is linux specific in that the location of wget is hard-coded into the script.  To use on windows that would need to be changed, but it should work otherwise on windows.  Wget is setup to continue a prior download, which has the extra advantage of skipping a file that is already complete, so that if you keep trying to download the same backup file set, you don’t download it again… and again… and again.

Please make sure you set your permissions on the file to 500 because it does have a plain-text email username and password in it.  Ideally you may have a separate account setup for this that is forwarded the correct email from the primary account that receives the notifications.  That will provide a bit more security.

Variables users will need to modify include storedir, imap_user, imap_password, imap_server, imap_port, dreamhost_account and numdaysold. These are located at the top of the script.


#! /usr/bin/env python

#(C) 2011 Chad Kidder
#This script is for pulling down a backup set from dreamhost

import imaplib
from datetime import datetime
import time
import os
import pdb

storedir = '/tmp'   #base directory for downloaded files
imap_server = 'imap.gmail.com'
imap_port = 993
imap_user = 'xxxx'
imap_password = 'xxxx'
numdaysold = 31 #Number of days worth of mail to search for backup success email
dreamhost_account = 12345   #Your dreamhost account number, used for matching email subject
wget = "/usr/bin/wget"  #location of wget on your system
    os.chdir(storedir)  #does the storage directory already exist
except:     #will fail if dir does not exist
    os.mkdir(storedir)  #if not, make it
#logging into imap server
M=imaplib.IMAP4_SSL(imap_server, imap_port)
M.login(imap_user, imap_password)
status, count = M.select('Inbox')

#This next line searches the inbox for an email from dreamhost
typ, msgnums = M.search(None, 'FROM', '"dreamhost"', 'SUBJECT', '"Your account (#%i) has been backed up!"' % dreamhost_account, 'SENTSINCE','"'+ datetime.strftime(datetime.fromtimestamp(time.time()-(3600*24*numdaysold)),r"%d-%b-%Y")+'"')

#change the returned data into an array of integers
msgnuml = [int(x) for x in (''.join(msgnums)).split(' ')]

#fetch the body of the newest (highest numbered) email
status, data = M.fetch(max(msgnuml), '(UID BODY[TEXT])')
#close the imap connection

import re
    #pull the pertinent info out of the email body, otherwise, we exit
    url = re.search(r'(http://\S+)',data[0][1],re.M).groups()[0]
    username = re.search(r'username:\s+(\S+)',data[0][1],re.M).groups()[0]
    password = re.search(r'password:\s+(\S+)',data[0][1],re.M).groups()[0]
    bdate = re.search(r'(\d{4}-\d{2}-\d{2})',url).groups()[0]   #get the backup date
        os.chdir(storedir+'/'+bdate)    #changing to date specific backup directory

    import subprocess
    #we know there are two subdirs that have the files we need
    #we know there are users and mysql
    subdirs = [('users', 'zip'), ('mysql','gz')]
    for sdir, ext in subdirs:
            os.chdir(storedir+'/'+bdate+'/'+sdir)   #changing to subdirectory
        #getting command ready to pull down different subdrectories
        execinfo = [wget, "-q", "-O", "-", "--http-user", username, "--http-password", password, url+'/'+sdir]
        ret = subprocess.Popen(execinfo, stdout=subprocess.PIPE).communicate()[0]
        #find all the archive files linked to in the subdirectory
        dls = re.findall(r'href="(\S+?\.%s)">' % ext, ret, re.M)
        print dls
        for fn in dls:
            #getting command ready to pull down archives
            execinfo = [wget, "-c", "--http-user", username, "--http-password", password, url+'/'+sdir+'/'+fn]
            #print "%s %s %s %s %s %s %s" % tuple(execinfo[:])   #must change format string for number of arguments
            print execinfo
            #pulling down archives
    os.sys.stderr.write('Encountered an error:\t')
    map(lambda x:os.sys.stderr.write(str(x)+'\t'), sys.exc_info())

Permanent link to this article: http://blog.curioussystem.com/2011/07/automating-dreamhost-backups/

1 comment

  1. Scott

    Hi, im trying your script out and am getting an error.

    Encountered an error: Traceback (most recent call last):
    File “backup.py”, line 83, in
    map(lambda x:os.sys.stderr.write(str(x)+’\t’), sys.exc_info())
    NameError: name ‘sys’ is not defined

    any ideas?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>