Selecting Suitable Classifiers in MVTec HALCON

Published on October 22, 2019 by TIS Marketing.

This post, Selecting Suitable Classifiers in MVTec HALCON is the third in a series of 7 posts from Pushing OCR Performance with MVTec HALCON: 1, 2, 3, 4, 5, 6, 7.

After successful segmentation, the next step is to select a suitable classifier for the OCR (OCR Classifier tab, Fig. 1 below). There is an extensive list of pretrained classifiers in HALCON which may seem overwhelming at first. Fortunately, the available classifiers are sorted by font and character set which can be displayed by clicking on the magnifying lens icon (top, right). There are always two versions of a classifier: one trained without rejection class (_NoRej) and one trained with rejection class (_Rej). Fonts that are trained with rejection class are able to distinguish characters from noise or background clutter. As a result, a symbol that can not be classified as one of the characters in the character set is classified as "not assignable" and is put into the rejection class.

Fig. 1: OCR assistant: OCR Classifier tab

For dot matrix print, the Dot Print classifier can be used as well as the Universal classifier. Since the sample image uses the whole range of symbols, a classifier without a specific ending for a symbol set can selected. The effect of rejection-class-trained classifiers becomes apparent when a classifier for a smaller set of symbols is used. For example, if the rejection-class-trained classifier for Dot Print is trained on a symbol set for letters A-Z and numbers 0-9, the symbols like colon, point and slash get classified into the rejection class (fig. 2.). Fig. 3 shows what happens when the classifier is trained without rejection class: even if classified with very low confidence, the symbols are always assigned a class.

Fig. 2 (above, left): Classification via rejection-class-trained classifier and character set A-Z 0-9: symbols like colon, point and slash are classified as rejected. Fig. 3 (above, right): Classification via classifier without rejection class and character set A-Z 0-9: symbols are classified as any other symbol with highest confidence even if this confidence is very low.

The next post takes a look at the Results tab. (Please see navigation at the top of this page for additional posts.)

Please click here to download image.