Creating Text Model Readers in MVTec HALCON

Published on October 29, 2019 by TIS Marketing.

This post, Creating Text Model Readers in MVTec HALCON is the fifth in a series of 7 posts from Pushing OCR Performance with MVTec HALCON: 1, 2, 3, 4, 5, 6, 7.

Fig. 1: Image of butter package with dot-matrix printed best-by date

This post gives a short overview of the different approaches for optical character recognition (OCR) in HDevelop. The generated code from the previous post will be examined in detail with a view to working with text model readers. (The download link at the bottom of this post provides the generated code).

In general, the code generated by the OCR assistant is based on text model readers and consists of the following steps:

  1. Creation of text model reader; setting of additional segmentation parameters
  2. Reading of OCR classifier
  3. Reading of image; definition of region of interest
  4. Segmentation of characters
  5. Classification of objects segmented in step 4

While additional steps are performed in between those listed above to assure consistent border and domain handling, these operations do not directly impact the topic OCR and so are not covered here. Looking at step 1 (creation of text model reader), a text model reader creates a text model describing to-be-segmented text. Basically, there are two modes which can be used to create a text model: auto and manual. In the generated code, the manual mode has been used. Due to the available options in the assistant, the manual mode is always selected when generating code automatically.

One major advantage of the auto mode is its ability to pass a classifier when creating the text model--simultaneously performing segmentation and classification in one step. Retrieving results is also slightly more convenient in this mode. Additionally, a range for the anticipated width and height of segmented characters can be defined. The auto mode offers an advanced functionality--namely, the segmentation of dot-matrix characters which constitutes another advantage here.

While it's possible to segment dot-matrix print with a model created in manual mode, there are fewer parameters with which to define this behavior. In contrast to the auto mode, manual mode does not allow for the definition of a range for character width and height but rather only an approximate size. The manual mode isn't all bad though: it is able to segment imprinted letters and restrict segmentation to uppercase letters. An option to define segmentation behavior regarding small fragments (which might be discarded as clutter or noise) is an additional advantage offered in manual mode.

There are some additional differences between the two modes, but the choice of mode is always highly dependent upon the specific application and necessary restrictions for character segmentation. The manual mode may be the one selected when using the assistant, but most HALCON guides recommend auto mode.

After selecting the right mode for the text model reader, crucial parameters are defined. In general, every parameter available for the manual mode is characterized by the suffix manual_; auto mode can be used with all the rest.

With step 1 complete, it's time to look at step 2 (reading of OCR classifier). For manual mode it is necessary to read a suitable classifier manually using the operator read_ocr_class_mlp. The ending mlp indicates that the underlying classifier type is a multilayer perceptron which is one of the common approaches for OCR in HALCON. (To learn more about the different types of classifiers used in HALCON, consider attending one of our regular Machine Learning in HALCON.) Options for various pretrained classifiers were discussed in a previous blog post. In auto mode, a suitable classifier can be passed directly when creating a text model reader; in this case, step 2 is not necessary and can be skipped.

To finally perform segmentation of the text visible in the input image, the operator find_text is applied. This returns a result handle with information about the segmented regions and, if auto mode was used, information about the classification results. These results can then be accessed via the operators get_text_object to address segmented regions and get_text_result to address classification results.

If segmentation and classification are performed separately, the operators do_ocr_multi_class_mlp or do_ocr_single_class_mlp can be used for classification of segmented regions. The multi-class operator is able to process multiple regions but only yields results about the best one class for every region. Single-class operators can only process a single region at a time but provides information about alternative classifications and confidences. Application of the classifier simultaneously on multiple regions performs slightly faster than iteratively executing do_ocr_single_class on the single regions.

To sum it up: There are text model readers which define a text model that describes the text to be segmented. There are two different modes, auto and manual which both have advantages and disadvantages. After segmentation, classification has to be performed on the single regions which were segmented using the text model.

Sometimes, pre-trained classifiers do not yield satisfying results in which case custom classifiers become necessary--a topic which will be discussed in subsequent posts. These posts also cover the creation of a training file and usage of the training file browser in HDevelop.

Please click here to download generated code from the previous post.