Tesseract, an Open Source Optical Character Recognition Engine

Tesseract is a free open source Optical Character Recognition (OCR) Engine for different operating systems available under the Apache 2.0 license. The OCR technology converts scanned images of printed texts into information that can be processed by a computer program and is commonly used in office and home environments. OCR engines are also often used as components of larger systems to track information by visual cues attached to objects, for example in a supply chain.  Developers use OCR technology to digitize print-based texts and create electronic resources, for example, corpora and lexica.


At the beginning, Tesseract engine was built at Hewlett-Packard labs as proprietary software between 1985 and 1994. The code was written in C and C++. Tesseract was never used for commercial purposes and its development was ceased. In 2005, the engine was released as an open source and since 2006, the development of Tesseract has been sponsored by Google.


Tesseract was developed to serve as a component part of other systems or programs so it is an OCR engine and not a complete OCR program with a full set of features. Users can work with Tesseract from the command line as it doesn't have a built-in GUI. But there are many separate third-party projects that supply GUI for Tesseract if it's integrated into them.


The Tesseract software can be used directly and programmers can use an API to extract handwritten, typed, or printed text from images. Tesseract can be used as a back-end solution and can be applied to more complex OCR tasks such as layout analysis by using front-end tools such as OCRopus.


Tesseract is available for Windows, Linux, and Mac OS X. The first version of Tesseract engine could only recognize texts written in English but later versions extended language support to over 100. It can also be trained to work with other scripts and languages.

Developers can use Tesseract in their own projects. The engine has a full-featured API so it can be compiled for a variety of applications, including iPhone and Android.


