Definitions

OCRopus

OCRopus

OCRopus is a free document analysis and OCR system released under the Apache License, Version 2.0 with a very modular design through the use of plugins. These plugins allow OCRopus to swap out components easily. OCRopus is currently developed under the lead of Thomas Breuel from the German Research Centre for Artificial Intelligence in Kaiserslautern, Germany and is sponsored by Google. OCRopus is developed for Linux; however, users have reported success with OCRopus on Mac OS X.

How it works

OCRopus is an OCR system that combines pluggable layout analysis, pluggable character recognition, and pluggable language modeling. It aims primarily for high-volume document conversion, namely for Google Book Search, but also for desktop and office use or for vision impaired people.

Currently, OCRopus uses Tesseract as its only character recognition plugin, but others are expected to be added in the future. This is especially useful in expanding functionality to include additional languages and writing systems. OCRopus also contains disabled code for a handwriting recognition engine which may be repaired in the future.

OCRopus itself does image preprocessing and layout analysis; it chops up the scanned document before passing it to Tesseract for line-by-line or character-by-character recognition.

As of the alpha release, OCRopus uses the language modeling code from another Google-supported project, OpenFST..

History

Release history:

  • Initial announcement - 9 April 2007
  • 0.1.0 - Alpha - 22 Oct 2007
  • 0.1.1 - 14 Dec 2007 - Improved build system
  • 0.2 - Alpha 2 - 31 May 2008
  • Beta - Scheduled for August 2008
  • 1.0 - Scheduled for Q3 2008 - Packaging for additional operating systems, GUI

Usage

Currently OCRopus can only be used from the command line. Once installed, it can be invoked by specifying the input images. It will output hOCR HTML code to standard out. If more precise control is needed, options can be specified on the command line to perform specific operations (e.g. recognizing a single line).

See also

References

External links

Search another word or see OCRopuson Dictionary | Thesaurus |Spanish
Copyright © 2014 Dictionary.com, LLC. All rights reserved.
  • Please Login or Sign Up to use the Recent Searches feature
FAVORITES
RECENT

;