Downsampling PDFs to save space

Posted June 18, 2010

One of the best things since sliced bread, IMHO, is automatic scan to email/pdf functionality on the multi-function copier/printer/scanner/fax. This makes copying print articles easy so that you can send them to friends, or keep an article out of something you borrowed. My personal philosophy is “scan once, process as needed.” That means I scan at a high resolution, and go from there.

Now, say you want to share that article, whatever with your friend… and that high-res PDF is too big to email… or they have a slow net connection… you get the idea. How do you shrink the PDF easily? Ghostscript is part of the answer. The other part is, write a script. What we have below is a script that takes multiple input files and runs each of them through ghostscript with its screen settings and outputs it with _small at the end of the base filename. Most of the logic in the script is just for parsing the file name and path to get the extension of the base filename in the right place.

So, without further adeu:

<a href="http://dl.curioussystem.com/scripts/pdf_downsample">pdf_downsample</a>
#! /usr/bin/env python
#This program takes an list of input filenames and
#converts them for output to a screen
#Copyright 2010, Chad Kidder
#Licensed under the GPL 3.0

import sys, re, subprocess, os
if len(sys.argv) < 1:
print "Need at least one input file"
sys.exit(1)

pcmd = r'/usr/bin/gs' #Full pathname of command
poptions = ['-sDEVICE=pdfwrite', '-sPAPERSIZE=legal', '-dBATCH', '-dNOPAUSE',
'-dPDFSETTINGS=/screen', '-sOutputFile='] #Command line options
pathre = re.compile(r'(.*/)([^/]+)') #Regular Expression to parse filename
#This will fail if someone has a / in the file name.
fnmatch = re.compile(r'(.+?)\.[PDF|pdf]') #RE to parse extension off filename
ofn_ext = '_small.pdf' #what to append to base filename
for fn in sys.argv[1:]:
cmd = pcmd
options = poptions[:] #Need the [:] to do a deep copy
pmatch = pathre.match(fn) #parse the path off the input file name
if type(None) == type(pmatch): #see if there was a no path found
tfn = fn
bpath = ''
else: #If we found a path
tfn = pmatch.groups()[1]
bpath = pmatch.groups()[0]
bname = fnmatch.match(tfn) #Parse base filename
if type(None) == type(bname): #Adding extension depending on if we
#Recognized the extension
ofn = bpath+tfn+ ofn_ext
else:
ofn = bpath+bname.groups()[0]+ ofn_ext

options[-1] = options[-1] + ofn #Adding output filename to last option
ocmd = [cmd] +options + [fn] #putting it all together in one big list
print ocmd
subprocess.call(ocmd) #run the command

Chad Kidder

Downsampling PDFs to save space

Leave a Comment Cancel reply

Leave a Comment
Cancel reply