Introduction to Software

Software for scanning projects can be understood in terms of the five main workflow stages:

  1. Acquisition: Use the camera to acquire images.
  2. Postprocessing: Rotate, de-skew, remove keystone, crop, and remove other Distortions from the images.
  3. OCR (optional): Use Optical Character Recognition to extract text from scanned images.
  4. Touchup: Make any necessary manual touchups to the images.
  5. Compression: Final compression and finishing of the images.

Software Directory

Name OS UI Cost Acquisition Post-processing Touchup Compression URL Notes
CHDK firmware TUI free yes no no no http://chdk.wikia.com/ A key software package in DIY Book Scanning
Stereo Data Maker firmware TUI free yes no no no http://stereo.jpn.org/eng/sdm/index.htm forum thread
Scan Tailor Linux,Windows GUI/QT free no yes yes no http://scantailor.sourceforge.net/ From the website: ''Scan Tailor is an interactive post-processing tool for scanned pages. It performs operations such as page splitting, deskewing, adding/removing borders, and others. You give it raw scans, and you get pages ready to be printed or assembled into a PDF or DJVU file. Scanning, optical character recognition, and assembling multi-page documents are out of scope of this project.''
BookScanWizard Java ? free ? yes yes yes http://sourceforge.net/projects/bookscanwizard/ From the website: ''A utility to help with Book scanning using cameras as a scanner. It will automate things such as cropping, rotating, fixing keystoning, fixing the DPI, and outputing it to tiff files that can be changed into PDF's or ebooks.''
Bookbuilder Java command-line free no yes no yes http://fynder.de/article/freeware-bookbuilder-einfacher-buchscanner-mit-hardware-ab-180-euro-35.html (german) Digitizing books with a digital camera - a set of book photos on dark background can be auto-extracted and converted to PDF (OCR Layer support) with one single command
Page Builder Windows ? free no yes no no From blog post ''PRESENTLY UNMAINTAINED'',Aaron’s PageBuilder is an open-source, Matlab based cropper with user-specified cropping . Requires Matlab. http://www.diybookscanner.org/forum/viewtopic.php?f=3&t=27
YAPP ? ? ? no yes no no Spamsickle’s Yet Another Page Plucker (YAPP) Spamsickle’s Yet Another Post Processor (YAPP) is a quick’n'dirty shot at a lightweight processor. Also from blog post
PostProcessor by Rob ? ? ? ? yes yes yes http://www.diybookscanner.org/forum/viewtopic.php?f=3&t=34 PostProcessor Version1 by Rob
gscan2pdf Linux ? free yes yes no yes http://gscan2pdf.sourceforge.net/ For Linux. Can produce PDF and [[Djvu]] files. From the website:''gscan2pdf - A GUI to produce PDFs or DjVus from scanned documents''
djvubind Linux, Mac, Windows command-line free no no no yes https://code.google.com/p/djvubind/ From the website:''Djvubind facilitates creating high-quality djvu files, especially digital versions of scanned books. It functions as a wrapper that combines the djvulibre tools, minidjvu, and various ocr engines to provide a simple, single command creation of a djvu file. It is fully supported on Linux and Mac, and works in Windows as well. Prior to djvubind, we highly recommend using Scantailor to process your scanned images.''
Bindery ? ? ? ? ? ? ? Download from github
PDFMaker ? ? ? ? ? ? ? ZIP-file see this thread in forum (development discontinued)
PDFBeads ? ? ? ? ? ? ? Download from rubyforge see this thread in forum; (google-translated) User's Guide
Sigil Windows, Linux and Mac GUI Free no no no no Sigil From the website:''Sigil is a multi-platform WYSIWYG ebook editor. It is designed to edit books in [[ePub]] format. Free and open source software under GPLv3''
Calibre Windows, Linux and Mac GUI Free no no no no Calibre From the website:''Calibre is a free and open source e-book library management application developed by users of e-books for users of e-books. It has a cornucopia of features.''
Comix Linux, BSD and virtually any other UNIX-like OS GUI Free no no no no Comix From the website:''Comix is a user-friendly, customizable image viewer. It is specifically designed to handle comic books, but also serves as a generic viewer. It reads images in ZIP, RAR or tar archives (also gzip or bzip2 compressed) as well as plain image files. It is written in Python and uses GTK+ through the PyGTK bindings.''
ImageMagick Linux, Windows, Mac CLI free no yes no yes http://www.imagemagick.org/
pdftk Linux, Windows, Mac CLI free no yes no yes http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/
pdfopt Linux CLI free no no no yes https://github.com/zackw/pdfopt
spreads Linux GUI free yes yes yes yes https://github.com/DIYBookScanner/spreads

OCR packages

There are many packages available for OCR, here is a quick list of links to get you started:

There are also Commercial OCR packages available:

Uploading Public Domain Works

More resources

Software category on the DIYBookScanner blog

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License