| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Homer bash script

Page history last edited by juliano 12 years, 7 months ago

    download standalone script: homer.sh

The script loads a directory of images which is expected to contain 2n images, and applies to them a series of manipulations. If no arguments are entered, the script returns the “Usage” info:

Usage: homer /path/to/input/directory [-r, -R, -L, -l \"lang\"] [-o \"output\"]"

Options:
-r, --rename"
    rename the first half of the files with incremental odd numbers, the second"
    half with incremental even numbers (JPEG only);"
-R, --right"
    rename (as above) and rotate the right-hand pages 90 degrees clockwise, and"
    the left-hand pages 90 degrees counterclockwise, starting from the RIGHT "
    pages (JPEG only);"
-L, --left"
    rename and rotate (as above) starting from the LEFT pages (JPEG only);"
-l, --lang"
    run Tesseract OCR and convert the images into a searchable PDF (TIFF only),"
    type \"homer -l\" (or \"--lang\") to view the list of supported languages;"
-o, --output"
    for [-r, -R, -L], the output is the path to the directory where the renamed"
        images should be saved (default is \$1/renamed);"
    for [-l], the output is the path to the final PDF, or its filename (default"
        is \$PWD/out.pdf)."

If only the first argument (i.e., the input directory) is entered, the script prompts the user to select whether to rename & rotate, simply rename, or run OCR engine and converting to PDF.

The options -R, -L, and -r allow to reorder a batch of JPEG images such that the first half and the second half are interspersed: that is, the first half is renamed with incremental odd numbers, while the second half is renamed with incremental even numbers.

The options -R and -L allow to fix the orientation of the JPEG images, by rotating the pages on the right-hand side of book 90 degrees clockwise, and those on the left-hand side of the book 90 degrees counterclockwise.

Finally, the option -l runs “Tesseract OCR” to extract the text from a batch of TIFF images, and then uses “PDFbeads” to bind the images and the text into a single, searchable PDF. The -l option must be followed by the three-character label for the desired Tesseract language. To print the list of supported languages with their correspondni glabels, you can type homer -l or homer --lang.

The -o option allows to specify the output location of the renamed/rotated files (a new folder will be created if it doesn’t exist), the location where to save the final PDF document (saved as “out.pdf” if the no filename is specified); or the name of the PDF file (the *.pdf extension appended to the filename is not necessary but will be added automatically if not entered by the user). The option accepts both absolute and relative paths.

The renaming/rotating bit of the script is inspired by Matti Kariluoma and his “RenameAll.exe” script, which he wrote for the already mentioned Cardboard Bookscanner project appeared on Instructables.com.

 

Back to Front Page or List of content.

Comments (0)

You don't have permission to comment on this page.