Commons:Extracting images from PDF

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
"COM:PDF" redirects here. For Commons guidelines on the use of PDF format, see Commons:Project_scope#PDF and DjVu formats.
300py

PDF files can contain images that are actually at a higher resolution than the "100%" size of the document. Possible ways to extract images from PDFs include:

  • pdfimages, the command-line program in the xpdf package. Use the -j option to losslessly extract JPEG-compressed images.
  • Nitro PDF has a function to pull all images out of a PDF file at full resolution, and you can choose the output format (jpg, png, etc). However, it won't work if the PDF is password-protected.
  • Evince, the most common Linux PDF reader, simply lets you right-click on an image and save it.
  • Get pieces via PrintScreen and stitch them together in Microsoft Paint, GIMP, or a similar third-party program.
  • GIMP can also open pages from a PDF as an image at the resolution you specify. This is not quite the same as extracting the images. It provides no guidance on the ideal resolution for a given image, and it essentially renders the whole page before converting everything to an image. In short, it equivalent to the screenshot approach, but less work.

Some PDF readers can tell you the resolution; for documents created using typical “print quality“ settings, 300 ppi is probably the best guess. (Caveat: where the originals are between 300 & 450 ppi they’re often not downsampled to the 300 target, and moreover black-and-white “linework” images, one bit deep, are often kept at 1200 ppi or more.

If the PDF is password-protected to prevent modification or extraction of content, you may be able to get around that by extracting the page with Inkscape, saving it as an unprotected file, then opening in Adobe Acrobat and passing the image to Photoshop or opening it in Nitro PDF and passing it to GIMP.