Applications and Limitations of tesseract

Application of Tesseract

License Plate Recognition:

Tesseract OCR An Optical Character Recognition Engine (OCR Engine) automatically recognize text in vehicle registration plates . It first detect and localize a license plate in an input image/frame. It then extract the characters from the license plate and then finally apply some form of Optical Character Recognition (OCR) to recognize the extracted characters.

Handwriting Recognition:

The printed or handwritten document is first scanned and then  separates each character and  after applying OCR it matches it to what it thinks is the most likely letter on a database

Limitation of Tesseract

Tesseract gives best result when there is a clean segmentation of the foreground text from the background. In practice, it can be extremely challenging to have such types of setup. There are a variety of reasons you might not get good quality output from Tesseract like if the image has noise on the background. The better the image quality (size, contrast, lightning) the better the recognition result. Tesseract OCR is quite powerful but does have the following limitations.

Following are the limitation:

  • Tesseract requires a bit of preprocessing to improve the OCR results: Images need to be scaled appropriately, have as much image contrast as possible, and the text must be horizontally aligned.
  • Doesn't do well with images affected by artifacts including partial occlusion, distorted perspective, and complex background.
  • It is not capable of recognizing handwriting.
  • It may find gibberish and report this as OCR output.
  • If a document contains languages outside of those given in the -l LANG arguments, results may be poor.
  • It is not always good at analyzing the natural reading order of documents. For example, it may fail to recognize that a document contains two columns, and may try to join text across columns.
  • Poor quality scans may produce poor quality OCR.
  • It does not expose information about what font family text belongs to.
  • Finally, Tesseract OCR only works on Linux, Windows and Mac OS X.

Comments

Popular posts from this blog

How tesseract ocr works?

Properties In Pytesseract

OCR with Pytesseract and OpenCV