What Can Modern OCR Do?
Just a few years ago, Optical character recognition solutions had been working well enough for the Indo-European group of languages. Some typewritten texts in different languages were well recognized, solutions for other languages were still under development. For example, for English and Russian languages, there were solutions for handwritten texts, which can process texts written by calligraphic hand only. Moreover, the images have to be scanned or photocopied in high definition exactly in parallel with the camera.
Even small changes in the slope of the surface, or in the writing style, sharply worsened quality of letters and words.
Similar shooting conditions belong to all recognition systems that can recognize texts in all known languages, both modern (German, French, Russian) and ancient such as ancient Japanese, Burmese.
Following the described shooting conditions, the output accuracy of the current solutions is at least 90% which is sufficient for many use cases. With changes in the surface, for example, the text printed on a bottle, or on a fabric, the accuracy varies from 65 to 95%. In this case it is already necessary to make a custom solution for each individual case, namely, a solution for recognizing the text on the bottle and a separate solution for recognizing the text on a fabric. There is still no one-size-fits-all solution with an acceptable quality.
At the moment, OCR tasks can be solved on almost all devices equipped with a computing processor and a camera. That is, in addition to personal computers, they can also be used on mobile devices, and on single-board computers such as Raspberry Pi and similar. Existing solutions allow to run on computationally weak devices only slightly sacrificing quality.
In reality, working solutions are combinations of convolutional and recurrent networks, various heuristics and custom methods of language processing. For example, if it is difficult to recognize a certain symbol, the solution may be to analyze adjacent symbols and try to predict whether this symbol fits into some word in which it can be found.
The old classical approaches and solutions based on ordinary simple machine learning models are hardly used anymore.
For text processing you can use ready-made solutions from Tesseract and Abbyy as well as Keras OCR or Easy OCR libraries. They will let you make a Proof of Concept, and then you need to point it toward a custom solution.
OCR solutions are not limited to language processing only. It’s widely used in the problems of recognizing chemical and mathematical formulas, accounting and financial documents.
Many problems of the past have been resolved, but several other OCR problems still remain. The performance of OCR systems on mobile devices and non-GPU computers remains an issue. Due to the lack of a GPU, some solutions have to be changed or adjusted for processing on a CPU, and for the sake of speed this affects the quality of the solution.
Another problem is the recognition of symbols on objects not on a flat surface, as described above. For example, an inscription on a bottle, a text on a tag of clothes, instruments, or air luggage is more difficult for recognition and as a workaround the barcodes might be used for identification which are more resistant to recognition.