Optical Character Recognition (OCR) is a well studied subject involving various application areas. OCR results in various limited problem areas are promising, however building highly accurate OCR application is still problematic in practice. This thesis discusses the problem of recognizing and confirming Bingo lottery numbers from a real lottery field, and a prototype for Android phone is implemented and evaluated. An OCR library Tesseract and two Artificial Neural Network (ANN) approaches are compared in an experiment and discussed. The results show that training a neural network for each number gives slightly higher results than Tesseract.
BACKGROUND / THEORY
In this chapter we describe the Android operating system. Background to Various OCR and image processing libraries is presented. Furthermore ANNs and Back Propagation learning is explained shortly.
In a Feed Forward neural network output signal from neurons travel only in one direction towards the layer that is called the output layer. If any output signal from a neuron travels in a cycle the network is in the Recurrent neural network class, which is not used for the problem in this thesis. A typical model of a Feed Forward neural network is shown in Figure 2.3.
In this chapter the scientific approach for this thesis is explained and motivated. Prototype requirements are described. Some information about reliability is provided. Finally some information about the proper use of application is presented.
- Scientific approach
This chapter gives details about the ANN implementation and the prototype.
Figure 4.4 shows the main menu of the prototype. Main menu has 4 buttons. Capture starts the camera to take a picture of a bingo grid. Enter numbers is for entering correct numbers of bingo. Train opens up a way to collect training data and train networks. Test train is for testing training.
Figure 4.6 is the helper image presented to the user before actually recognizing any numbers. It should not be confused with recognizing numbers because at this stage the numbers have not been run through Tesseract or ANNs. In this picture the grid is recognized properly and the user should press Recognize button.
Figure 4.9 show the settings for the prototype. Here one can choose between Tesseract, single network and multiple network to be used during recognition.
RESULTS / EMPIRICAL DATA
This chapter contains data collected from the experiment. Chapter 5.1 outlines the experiment. The following chapters present results of the experiment for different recognition approaches.
- ANN comparison
- Single network results
- Multiple network results
- Tesseract results
- Google Goggles and Prototype test
Results for single network, multiple network and Tesseract are 77.81%, 96.84% and 96.18% respectively. It should be noted that accuracy on training set is considerably lower for single network then multiple network. There is a difference between single network approach and multiple network approach in how they use training data. See chapter 4.1 for details. Training time was not included in the experiment but both networks had at most 1000 iterations for training.
Some extra tests were made with single network configuration 324,20,10 and 324,40,10 and 1000 iterations but the resulting accuracy was even lower. With single network configuration 324,100,10 and 2500 training iterations the results improved 3% – 5%. The errors of grid recognition are not included any of the results, which means that practical results of the prototype are lower. Usually there is need to take several pictures of the same grid.
The results of Tesseract might be improved if images of correct DPI are provided. The results of ANNs can probably be improved if the number in the training and recognition image is moved to one corner or always centered. It should be noted that ANNs were trained specifically with data and images of problem presented by this thesis whereas Tesseract is a more general engine.
First attempt at making helper methods described in the Introduction was unsuccessful. The first attempt consisted of manipulating the Android camera buffer before the preview to draw a rectangle around the bingo grid. The frame rate of preview dropped very low and the rectangle did not stay fixed but moved around from frame to frame. OpenCV library however has documented support for Android camera manipulation. With extra work it is possible to achieve this. The final solution is simpler. The user is presented with a cut out black and white bingo grid if the picture was taken correctly. Then user can then proceed to recognition phase.
This thesis compared the Tesseract and two different neural network approaches for the purpose of optical character recognition. A single test for Tesseract and 10-fold cross validation was done on the ANNs. A prototype was built for Android that can recognize numbers from a bingo lottery. A small comparison between Google Goggles and the prototype was performed.
This section gives short answers to the research questions.
- RQ1: What is the most accurate library to use for doing OCR of digits: Tesseract or ANN?
- RQ1.1: How does our prototype compare to Google Goggles Sudoku recognition?
The results show that it is possible to achieve same accuracy with ANNs then Tesseract, however the difference in results is not significant. It took about 2 days to collect the training data for ANN approach and another 6 hours to train. It is a personal choice weather to use Tesseract or ANNs, however when using ANNs the multiple network approach is a recommended choice.
The results show that total recognition rate for the prototype (35%) is lower then Google
Goggles (80%). The resulting data indicates that multiple network approach performs better then single network approach. This comparison was not the goal of the study. There is not enough data to say if single network is slower to train or less accurate. To see if multiple networks perform better generally could be an area of further research.
Authors: Henno JOOSEP