«

»

Jun 18

Downsampling PDFs to save space

One of the best things since sliced bread, IMHO, is automatic scan to email/pdf functionality on the multi-function copier/printer/scanner/fax.  This makes copying print articles easy so that you can send them to friends, or keep an article out of something you borrowed.  My personal philosophy is “scan once, process as needed.”  That means I scan at a high resolution, and go from there.

Now, say you want to share that article, whatever with your friend… and that high-res PDF is too big to email… or they have a slow net connection… you get the idea.  How do you shrink the PDF easily?  Ghostscript is part of the answer.  The other part is, write a script.  What we have below is a script that takes multiple input files and runs each of them through ghostscript with its screen settings and outputs it with _small at the end of the base filename.  Most of the logic in the script is just for parsing the file name and path to get the extension of the base filename in the right place.

So, without further adeu:

<a href="http://dl.curioussystem.com/scripts/pdf_downsample">pdf_downsample</a>
#! /usr/bin/env python
#This program takes an list of input filenames and
#converts them for output to a screen
#Copyright 2010, Chad Kidder
#Licensed under the GPL 3.0

import sys, re, subprocess, os
if len(sys.argv) &lt; 1:
    print "Need at least one input file"
    sys.exit(1)

pcmd = r'/usr/bin/gs'            #Full pathname of command
poptions = ['-sDEVICE=pdfwrite', '-sPAPERSIZE=legal', '-dBATCH', '-dNOPAUSE',
    '-dPDFSETTINGS=/screen', '-sOutputFile=']         #Command line options
pathre = re.compile(r'(.*/)([^/]+)')            #Regular Expression to parse filename
    #This will fail if someone has a / in the file name.
fnmatch = re.compile(r'(.+?)\.[PDF|pdf]')       #RE to parse extension off filename
ofn_ext = '_small.pdf'         #what to append to base filename
for fn in sys.argv[1:]:
    cmd = pcmd
    options = poptions[:]        #Need the [:] to do a deep copy
    pmatch = pathre.match(fn)         #parse the path off the input file name
    if type(None) == type(pmatch):      #see if there was a no path found
        tfn = fn
        bpath = ''
    else:       #If we found a path
        tfn = pmatch.groups()[1]
        bpath = pmatch.groups()[0]
        bname = fnmatch.match(tfn)      #Parse base filename
    if type(None) == type(bname):        #Adding extension depending on if we
    #Recognized the extension
        ofn = bpath+tfn+ ofn_ext
    else:
        ofn = bpath+bname.groups()[0]+ ofn_ext

    options[-1] = options[-1] + ofn #Adding output filename to last option
    ocmd = [cmd] +options + [fn]              #putting it all together in one big list
    print ocmd
    subprocess.call(ocmd)         #run the command

Permanent link to this article: http://blog.curioussystem.com/2010/06/downsampling-pdfs-to-save-space/

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>