OCR with Pytesseract and OpenCV
OCR with Pytesseract and OpenCV
Pytesseract is a wrapper for Tesseract-OCR Engine. It is also useful as a stand-alone invocation script to tesseract, as it can read all image types supported by the Pillow and Leptonica imaging libraries, including jpeg, png, gif, bmp, tiff, and others.
Preprocessing for Tesseract
The main objective of the Preprocessing phase is to make as easy as possible for the OCR system to distinguish a character/word from the background.
Some of the most basic and important Preprocessing techniques are:-
- Binarization.
- Skew Correction.
- Noise Removal.
- Thinning and Skeletonization.
- Binarization:
In layman’s terms Binarization means converting a coloured image into an image which consists of only black and white pixels (Black pixel value=0 and White pixel value=255). As a basic rule, this can be done by fixing a threshold (normally threshold=127, as it is exactly half of the pixel range 0–255). If the pixel value is greater than the threshold, it is considered as a white pixel, else considered as a black pixel.
But this strategy may not always give us desired results. In the cases where lighting conditions are not uniform in the image, this method fails.
- Skew Correction:
While scanning a document, it might be slightly skewed (image aligned at a certain angle with horizontal) sometimes. While extracting the information from the scanned image, detecting & correcting the skew is crucial.
However, to determine skew in documents, we will take the binary image and project it horizontally to get a histogram of pixels along the height of the image. Now the image is rotated at various angles and the difference between the peaks will be calculated. The angle at which the maximum difference between peaks is found, that corresponding angle will be the Skew angle for the image. After finding the Skew angle, we can correct the skewness by rotating the image through an angle equal to the skew angle in the opposite direction of skew.
- Noise Removal:
The main objective of the Noise removal stage is to smoothen the image by removing small dots/patches which have high intensity than the rest of the image. Noise removal can be performed for both Coloured and Binary images.
- Thinning and Skeletonization:
This is an optional preprocessing task which depends on the context in which the OCR is being used.
If we are using the OCR system for the printed text, No need of performing this task because the printed text always has a uniform stroke width.
If we are using the OCR system for handwritten text, this task has to be performed since different writers have a different style of writing and hence different stroke width. So to make the width of strokes uniform, we have to perform Thinning and Skeletonization.
Image Thresholding
There’s not a single image thresholding method that fits all types of documents. In reality, all filters perform differently on varying images. For instance, while some filters successfully binarize some images, they may fail to binarize others. Likewise, some filters may work well with those images that other filters cannot binarize well.
Types of thresholding :
- Simple Threshold
- Adaptive Threshold
- Otsu’s Threshold
- Simple Thresholding
If pixel value is greater than a threshold value, it is assigned one value (may be white), else it is assigned another value (may be black). The function used is cv.threshold.
- First argument is the source image, which should be a grayscale image.
- Second argument is the threshold value which is used to classify the pixel values.
- Third argument is the maxVal which represents the value to be given if pixel value is more than (sometimes less than) the threshold value.
- OpenCV provides different styles of thresholding and it is decided by the fourth parameter of the function.
- Adaptive Threshold
Simple thresholding may not be good in all the conditions where image has different lighting conditions in different areas. In that case, we go for adaptive thresholding. In this, the algorithm calculate the threshold for a small regions of the image. So we get different thresholds for different regions of the same image and it gives us better results for images with varying illumination.
It has three ‘special’ input params and only one output argument.
- cv.ADAPTIVE_THRESH_MEAN_C : threshold value is the mean of neighbourhood area
- cv.ADAPTIVE_THRESH_GAUSSIAN_C : threshold value is the weighted sum of neighbourhood values where weights are a gaussian window.
- Otsu’s THRESHOLDING :
This method particularly works well with bimodal images, which is an image whose histogram has two peaks.
Comments
Post a Comment