Introduction to Software

Software for scanning projects can be understood in terms of the five main workflow stages:

  1. Acquisition: Use the camera to acquire images.
  2. Postprocessing: Rotate, de-skew, remove keystone, crop, and remove other Distortions from the images.
  3. OCR (optional): Use Optical Character Recognition to extract text from scanned images.
  4. Touchup: Make any necessary manual touchups to the images.
  5. Compression: Final compression and finishing of the images.

Software Directory

Name OS UI Cost Acquisition Post-processing Touchup Compression URL Notes
CHDK firmware TUI free yes no no no A key software package in DIY Book Scanning
Stereo Data Maker firmware TUI free yes no no no forum thread
Scan Tailor Linux,Windows GUI/QT free no yes yes no From the website: ''Scan Tailor is an interactive post-processing tool for scanned pages. It performs operations such as page splitting, deskewing, adding/removing borders, and others. You give it raw scans, and you get pages ready to be printed or assembled into a PDF or DJVU file. Scanning, optical character recognition, and assembling multi-page documents are out of scope of this project.''
BookScanWizard Java ? free ? yes yes yes From the website: ''A utility to help with Book scanning using cameras as a scanner. It will automate things such as cropping, rotating, fixing keystoning, fixing the DPI, and outputing it to tiff files that can be changed into PDF's or ebooks.''
Bookbuilder Java command-line free no yes no yes (german) Digitizing books with a digital camera - a set of book photos on dark background can be auto-extracted and converted to PDF (OCR Layer support) with one single command
Page Builder Windows ? free no yes no no From blog post ''PRESENTLY UNMAINTAINED'',Aaron’s PageBuilder is an open-source, Matlab based cropper with user-specified cropping . Requires Matlab.
YAPP ? ? ? no yes no no Spamsickle’s Yet Another Page Plucker (YAPP) Spamsickle’s Yet Another Post Processor (YAPP) is a quick’n'dirty shot at a lightweight processor. Also from blog post
PostProcessor by Rob ? ? ? ? yes yes yes PostProcessor Version1 by Rob
gscan2pdf Linux ? free yes yes no yes For Linux. Can produce PDF and [[Djvu]] files. From the website:''gscan2pdf - A GUI to produce PDFs or DjVus from scanned documents''
djvubind Linux, Mac, Windows command-line free no no no yes From the website:''Djvubind facilitates creating high-quality djvu files, especially digital versions of scanned books. It functions as a wrapper that combines the djvulibre tools, minidjvu, and various ocr engines to provide a simple, single command creation of a djvu file. It is fully supported on Linux and Mac, and works in Windows as well. Prior to djvubind, we highly recommend using Scantailor to process your scanned images.''
Bindery ? ? ? ? ? ? ? Download from github
PDFMaker ? ? ? ? ? ? ? ZIP-file see this thread in forum (development discontinued)
PDFBeads ? ? ? ? ? ? ? Download from rubyforge see this thread in forum; (google-translated) User's Guide
Sigil Windows, Linux and Mac GUI Free no no no no Sigil From the website:''Sigil is a multi-platform WYSIWYG ebook editor. It is designed to edit books in [[ePub]] format. Free and open source software under GPLv3''
Calibre Windows, Linux and Mac GUI Free no no no no Calibre From the website:''Calibre is a free and open source e-book library management application developed by users of e-books for users of e-books. It has a cornucopia of features.''
Comix Linux, BSD and virtually any other UNIX-like OS GUI Free no no no no Comix From the website:''Comix is a user-friendly, customizable image viewer. It is specifically designed to handle comic books, but also serves as a generic viewer. It reads images in ZIP, RAR or tar archives (also gzip or bzip2 compressed) as well as plain image files. It is written in Python and uses GTK+ through the PyGTK bindings.''
ImageMagick Linux, Windows, Mac CLI free no yes no yes
pdftk Linux, Windows, Mac CLI free no yes no yes
pdfopt Linux CLI free no no no yes
spreads Linux GUI free yes yes yes yes

OCR packages

There are many packages available for OCR, here is a quick list of links to get you started:

There are also Commercial OCR packages available:

Uploading Public Domain Works

More resources

