Introduction to Software
Software for scanning projects can be understood in terms of the five main workflow stages:
- Acquisition: Use the camera to acquire images.
- Postprocessing: Rotate, de-skew, remove keystone, crop, and remove other Distortions from the images.
- OCR (optional): Use Optical Character Recognition to extract text from scanned images.
- Touchup: Make any necessary manual touchups to the images.
- Compression: Final compression and finishing of the images.
Software Directory
Name | OS | UI | Cost | Acquisition | Post-processing | Touchup | Compression | URL | Notes |
---|---|---|---|---|---|---|---|---|---|
CHDK | firmware | TUI | free | yes | no | no | no | http://chdk.wikia.com/ | A key software package in DIY Book Scanning |
Stereo Data Maker | firmware | TUI | free | yes | no | no | no | http://stereo.jpn.org/eng/sdm/index.htm | forum thread |
Scan Tailor | Linux,Windows | GUI/QT | free | no | yes | yes | no | http://scantailor.sourceforge.net/ | From the website: ''Scan Tailor is an interactive post-processing tool for scanned pages. It performs operations such as page splitting, deskewing, adding/removing borders, and others. You give it raw scans, and you get pages ready to be printed or assembled into a PDF or DJVU file. Scanning, optical character recognition, and assembling multi-page documents are out of scope of this project.'' |
BookScanWizard | Java | ? | free | ? | yes | yes | yes | http://sourceforge.net/projects/bookscanwizard/ | From the website: ''A utility to help with Book scanning using cameras as a scanner. It will automate things such as cropping, rotating, fixing keystoning, fixing the DPI, and outputing it to tiff files that can be changed into PDF's or ebooks.'' |
Bookbuilder | Java | command-line | free | no | yes | no | yes | http://fynder.de/article/freeware-bookbuilder-einfacher-buchscanner-mit-hardware-ab-180-euro-35.html (german) | Digitizing books with a digital camera - a set of book photos on dark background can be auto-extracted and converted to PDF (OCR Layer support) with one single command |
Page Builder | Windows | ? | free | no | yes | no | no | From blog post | ''PRESENTLY UNMAINTAINED'',Aaron’s PageBuilder is an open-source, Matlab based cropper with user-specified cropping . Requires Matlab. http://www.diybookscanner.org/forum/viewtopic.php?f=3&t=27 |
YAPP | ? | ? | ? | no | yes | no | no | Spamsickle’s Yet Another Page Plucker (YAPP) | Spamsickle’s Yet Another Post Processor (YAPP) is a quick’n'dirty shot at a lightweight processor. Also from blog post |
PostProcessor by Rob | ? | ? | ? | ? | yes | yes | yes | http://www.diybookscanner.org/forum/viewtopic.php?f=3&t=34 PostProcessor Version1 by Rob | |
gscan2pdf | Linux | ? | free | yes | yes | no | yes | http://gscan2pdf.sourceforge.net/ | For Linux. Can produce PDF and [[Djvu]] files. From the website:''gscan2pdf - A GUI to produce PDFs or DjVus from scanned documents'' |
djvubind | Linux, Mac, Windows | command-line | free | no | no | no | yes | https://code.google.com/p/djvubind/ | From the website:''Djvubind facilitates creating high-quality djvu files, especially digital versions of scanned books. It functions as a wrapper that combines the djvulibre tools, minidjvu, and various ocr engines to provide a simple, single command creation of a djvu file. It is fully supported on Linux and Mac, and works in Windows as well. Prior to djvubind, we highly recommend using Scantailor to process your scanned images.'' |
Bindery | ? | ? | ? | ? | ? | ? | ? | Download from github | |
PDFMaker | ? | ? | ? | ? | ? | ? | ? | ZIP-file | see this thread in forum (development discontinued) |
PDFBeads | ? | ? | ? | ? | ? | ? | ? | Download from rubyforge | see this thread in forum; (google-translated) User's Guide |
Sigil | Windows, Linux and Mac | GUI | Free | no | no | no | no | Sigil | From the website:''Sigil is a multi-platform WYSIWYG ebook editor. It is designed to edit books in [[ePub]] format. Free and open source software under GPLv3'' |
Calibre | Windows, Linux and Mac | GUI | Free | no | no | no | no | Calibre | From the website:''Calibre is a free and open source e-book library management application developed by users of e-books for users of e-books. It has a cornucopia of features.'' |
Comix | Linux, BSD and virtually any other UNIX-like OS | GUI | Free | no | no | no | no | Comix | From the website:''Comix is a user-friendly, customizable image viewer. It is specifically designed to handle comic books, but also serves as a generic viewer. It reads images in ZIP, RAR or tar archives (also gzip or bzip2 compressed) as well as plain image files. It is written in Python and uses GTK+ through the PyGTK bindings.'' |
ImageMagick | Linux, Windows, Mac | CLI | free | no | yes | no | yes | http://www.imagemagick.org/ | |
pdftk | Linux, Windows, Mac | CLI | free | no | yes | no | yes | http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/ | |
pdfopt | Linux | CLI | free | no | no | no | yes | https://github.com/zackw/pdfopt | |
spreads | Linux | GUI | free | yes | yes | yes | yes | https://github.com/DIYBookScanner/spreads |
OCR packages
There are many packages available for OCR, here is a quick list of links to get you started:
- tesseract http://code.google.com/p/tesseract-ocr/
- Ocropus http://code.google.com/p/ocropus/
- Cunieform http://en.openocr.org/ https://launchpad.net/cuneiform-linux
- gocr http://jocr.sourceforge.net/index.html
- ClaraOCR http://www.claraocr.org/
- Kooka http://kooka.kde.org/index.php
- ocrad http://savannah.gnu.org/projects/ocrad/
There are also Commercial OCR packages available:
- Abbyy Finereader see this forum thread
- Nuance Omnipage see Omnipage Professional
Uploading Public Domain Works
More resources
Software category on the DIYBookScanner blog
page revision: 8, last edited: 17 Dec 2015 14:34