In our last blog post about Automated Meter Reading (AMR), we tackled the automated reading of meter digits by using different (static) preprocessing methods to segment single digits, followed by a template matching procedure to classify them. This approach led to quite satisfactory results, but only in a rather static setup. This time we wanted to dive into this topic again and further generalize the whole process, making it applicable to various environments. Therefore we will describe in this post how we used different thresholding methods in parallel, letting our system select the best fitting one for every situation. Besides that, we are going to explain how and why we replaced the template matching approach by the powerful Tesseract-OCR Engine and how this improved our results compared to part 1 of our blog series. As already specified in part 1, we want to focus on images taken by smartphone cameras. These images can then be analyzed on the smartphone directly or be forwarded to a human operator for manual processing if the system isn’t confident enough about its analysis.
Analysis of the Previous System
In contrast to the first part of our AMR blog post, where we were able to achieve good results in a static environment, we now want to create a more adaptable solution. To achieve this we first had to identify the steps of AMR which are currently preventing generalization, like for example the thresholding including edge detection. In our first approach, we optimized the thresholding for a small set of images which worked just fine. Nevertheless, without more data points to optimize on, this is not going to be applicable to multiple images. That’s why we came up with the idea of using Otsu Thresholding, which adapts its thresholds to different images according to their histograms. This approach led to slightly better results but still wasn’t satisfactory.
However, in order to segment the digits and make the prediction more robust later on, a properly working thresholding is required. After taking a look into other computer vision projects working with little amount of data for thresholding to adapt on, we came up with an approach of using multiple thresholding methods and letting our system choose the one best fitting. With this concept of solving our thresholding problem, yet another challenge occurred: Which metric should our system use to determine the most suitable thresholding option? It would be an easy task if using traditional machine or deep learning methods since probabilities for the results being correct would be provided (often called confidence). We would just use the method which results in the highest confidence score for a prediction. The template matching approach, however, doesn’t provide any metric like that.
Besides the lacking correctness metric of the template matching approach, it also blocks generalization by its fundamental functionality. As a small recap: In our approach of template matching, we use a picture for every digit as our ground truth (the so-called template) and calculate the euclidean distance from every template to each segmented digit in the processed image. The template with the smallest distance for to every segment is then used as their class label. Thinking about metering devices with slightly different fonts or pictures taken under different lighting conditions, the problem of matching a template is quite obvious. To overcome this and make our digit recognition more accurate, we could either use various templates for comparison (which would also require a larger dataset) or just switch to a completely new method. We chose the latter option and switched to Tesseract OCR, an open-source optical character recognition engine, working without any own data. The engine itself is based on LSTMs (Long Short Time Memories) and can recognize more than 100 languages “out of the box”. Additionally, you can use and train own language files generated by custom fonts and make the engine fitting to your problem as good as possible.
After starting to work with Tesseract we discovered that the results are more reliable when using segmented characters and digits. Segmentation of any sort is a well-known problem in machine learning. But once again these methods require data, which is not available. That’s why we developed a fully optical approach using OpenCV and various conditions to decide whether single parts of our image contain segmented digits.
With all of that in mind, we now identified many different system attributes, that have to be changed in order to make our application work as desired. The identified key changes are:
- Adaptive Thresholding
- Advanced Digit Segmentation
- Tesseract OCR and its Confidence
How we implemented these different steps will be described in detail hereinafter.
As already described in the previous chapter, the adaptive thresholding was one of the biggest parts we had to change to generalize our system. First of all, we started to eliminate image noise, because this obstructs thresholding in the further procedure. We did that with Non-Local Means Denoising and Gaussian Blur. Next, we used four different thresholds to preprocess our images. The thresholds consist of three different configurations of the so-called Local Adaptive Binarization, which Wolf et al. presented in their paper Text localization, enhancement and binarization in multimedia documents  in 2002. This binarization is based on Niblack’s algorithm which is creating a thresholded image. A rectangular window is used to glide across the image and compute the threshold value for the center pixel by using the mean and the variance of the gray values in this window.
Among other improvements of this approach for reducing the amount of noise, Wolf changed the threshold calculation method in order to normalize the contrast and the mean gray level of the image. We used various window sizes to ensure a separation of digits and different thickness levels in order to get a better OCR result. The fourth method we use is the histogram-based Otsu Thresholding. Using this setup we developed an effective thresholding technique for diverse test situations. The results you can see in the pictures below.
For achieving better results in the classification of meter values we, first of all, have to segment the values into single digits. We accomplish that by using the OpenCV findContours function, just as in our previous version. This function automatically finds contours in binary images from which bounding boxes can be created with another OpenCV function: boundingRect. The contour function works particularly well when using an image containing only the region of interest, but gets less precise when the image contains more unnecessary information, like other text written on metering devices. To overcome this issue we also took the hierarchy, the bounding box positions and sizes into account. The hierarchy helps us to distinguish inner and outer contours. Inner contours such as the inner circle of zero get automatically discarded since the outer contour of the digit is already found. The relative position of the found bounding boxes to each other is considered because the meter digits have to be aligned in a roughly straight horizontal line, thus found contours which aren’t aligned along this line are treated as meta information and therefore also get discarded.
Last but not least we use the size and the aspect ratio of the contours and drop the ones being too small, big or in the wrong format. You can see the segmentation and the categorization of the segments in the plots below. The green boxes here are considered correct and are used for further processing and the red and blue ones are discarded and considered as redundant meta information.
Tesseract OCR and its Confidence
On the final step we used Tesseract OCR to overcome the blocking template matching and to receive a confidence value for each reading prediction. Considering that our project is implemented in Python we had to include a Python wrapper called pytesseract into our prototyping infrastructure, since Tesseract itself is developed in C . As already mentioned, Tesseract works with language packages, that represent the knowledge of the system. For our work, we used a special language package called digit_comma, which was trained on digits visually similar to the ones used on the tested metering devices.
Since we had to decide which of our thresholding methods is working best, we needed a metric for correctness of the predictions. Considering that the typical image_to_string function doesn’t provide such a metric, we obtained it by Tesseracts image_to_data function. It returns a confidence value representing the probability of the classification being correct. Aside from that, this function also delivers other useful information like box boundaries and level information which have to be parsed out of a tab separated value (TSV) file. With this information at hand, we now filter our particular digit predictions by a confidence threshold of 85 percent. Then the highest confidence for each digit is chosen and the complete reading result is constructed, which consists of the different best-made classifications with different thresholds. The pictures below show the satisfactory results of the tesseract classification.
In this article, it is described how we improved our initial approach to automated meter reading and made it more adaptable to various situations without using any additional data and therefore no data-driven algorithms. Our adaptive thresholding method turned out to be very effective and tesseract has proven to deliver impressive results with the right setup and the usage of the right language packages.
If the user is given a fixed window where only the area of interest of the electricity or water meter must be placed, the shown above results can be easily achieved. This method is currently being used by many suppliers of AMR applications.
Since we could achieve optimal results using the Tesseract engine, we could ultimately combine the approach with a suitable data set so that we use machine or deep learning models for segmentation with consequent interpretation of the segmented digits by the Tesseract OCR.
 Christian Wolf, Jean-Michel Jolion and Francoise Chassaing.
Text Localization, Enhancement and Binarization in Multimedia Documents.
International Conference on Pattern Recognition (ICPR),
volume 4, pages 1037-1040, 2002.