Step 2: Post-processing with Scan Tailor


Step 2: Post-processing with Scan Tailor

Scan Tailor is a truly powerful tool for enhancing the quality of scanned images, making them more readable, more similar to print than mere digital photos of a book. Its developer, Joseph Artsimovich, defines it as

An interactive post-processing tool for scanned pages. It performs operations such as page splitting, deskewing, adding/removing borders, and others. You give it raw scans, and you get pages ready to be printed or assembled into a PDF[6].

Here we will not try to explain in detail how to use such a complex program. For that, there is already the Scan Tailor Wiki page, which includes a User Guide as well as video tutorials. Here we will only give a few brief instructions in order to obtain acceptable results without too many troubles. Actually, the best place to start is the “Quick Start Guide” on the same official wiki. What follows is basically a paraphrase from the latter.

  1. Lauch ScanTailor and select ‘New project’ to begin with a new set of images.
  2. Browse to select the directory that contains the images you wish to process.
  3. ScanTailor will create by default an ‘out’ directory for processed images placed inside the input directory. Leave it like that.
  4. At the next stage it’s necessary to set the “dots-per-inch” (DPI) level. Click on ‘All pages’, and use the ‘Custom’ drop-down menu to select ‘300 x 300’ [7].
  5. We may skip the ‘Fix orientation’ step if we have already rotated the images using jpegtran.
  6. Click to ‘Split pages’, and Scan Tailor will try to autodetect the page layout. In our case, with a single-camera book scanner taking one page at a time, the correct option is the central one in the Page Layout section (i.e., single page with something to crop at the edges). Then click on the > arrow to have ScanTailor batch process the remainder of the images according to the selected Page Layout option. You should always inspect individual pages in the thumbnails to the right of the main page, in order to prevent Scan Tailor from cutting away some portion of text or image. Clicking on a thumbnail will load that page to the main area and allow for manual adjustment of various parameters.
  7. After having splitted the pages, you need to ‘deskew’ or straighten up the inclination of the pages such that the text is correctly aligned. Once again use the > arrow to batch process as needed and use the thumbnails to quickly check each page.
  8. Click on ‘Select content’ and batch process (> right arrow), while always keeping an eye on the thumbnails. If something is not selected you should adjust it manually, by selecting the thumbnail and resizing the content box in the main page as needed (by dragging its edges). If no content is selected a content box may be manually entered by selecting that image and right-click in the main window, ‘Add content box’. Likewise one may remove a content box from a page by right-click and ‘Delete’.
  9. Then move to the ‘Margins’ section and adjust the various parameters according to your requirements. ScanTailor defaults work well for most cases. Again, click the > arrow and then go on to the final step.
  10. In the ‘Output’ section, you can select the type of image output desired. Black and white will produce a clean output image for simple text and line drawings. If some text appears to be ‘missing’ try increasing the line thickness to see if it appears. Once you choose your settings, these may be applied to all pages and run through the batch process (> arrow). Alternatively for images with photographs or graphics you may select ‘mixed’ or even ‘colour/grayscale’. Our recommendation is to stick to black & white whenever is possible, so that the file will load faster and have a smaller size; and to select either 300 or 400 DPI as ‘Output resolution’, that is, about the same as the input one (600 is maybe too much for simple text). Anyway, it’s always better to check the thumbnails – particularly at this final stage –, to make sure that the required content is not excluded by mistake.
  11. The ‘output’ batch process is usually the one that take the longest to complete, depending on the overall performaces of your computer. Set up Scan Tailor to ‘beep when finished’ and go have a coffee or tea. The output files will be saved as TIFF in the default ‘out’ directory inside the ‘renamed’ directory.

 

Back to Front Page or List of content.