Financial reports are a key part of work of various financial institutions as their timeliness and correctness of processing often directly influences the company’s strategy. There is a tendency that the amount of data for manual processing is mounting rapidly. It becomes impossible for a person to provide a deep analysis of a report timely, so automated data processing comes to the forefront.
Financial report sample of the Morningstar agency
Financial reports differ in that apart of textual information they display many visual components such as various graphs, charts and other visual elements. Moreover, the text is scattered in various places.
In this situation, there is a significant amount of Machine Learning tasks that should be solved. Optical Character Recognition, Deep Learning and Natural Language Processing is a main toolset of the necessary solutions.
It is very important to find the required elements correctly among different textual structures, visual elements, tables, text blocks, and paragraphs. It is a delicate task to recognize the boundaries of the text and the tabular presentation. As you can see in the image above, in both cases, we have typographic fonts, and entities differ only in the form of text presentation.
Once the correct structure of the document is determined, various OCR approaches are applied, allowing, if necessary, to turn the graphic text representation into the text. The results of work is always a text, but the text will contain algorithm operation errors as well as possible text errors in the document itself. Thus, the intermediate results must be processed by machine learning systems designed to correct errors.
Since the reports can be issued in different languages, the Machine Learning systems will be different too. For example, the structure of words in Indo-Germanic languages will differ dramatically from Asian languages or Israeli.
And having passed this difficult path, we get a pure text. However, this is not the final goal. It is important to understand what is written in the report. For these purposes, Natural Language Processing systems are invited onto the stage.
For instance, using NLP we meet the challenge of document’s Topic Classification. That is, even financial reports can be different: annual, monthly, weekly or there could be biweekly reports with a slightly different focuses such as a profit and loss statement, a stock statement, and so on.
One of the most important tasks of NLP is to define entities in documents, i.e. to understand which company is being written about, over what period of time, whether any important event happened, etc. Based on the obtained entities it is possible to build a knowledge base about the report so the ML system can start to form its ontology.
In closing, it would be useful to note that many issues can be solved using Machine Learning in financial reports. Here the most obvious solutions were only described.