Big Data Increases the Potential for Business Offers Up to Ten Times
30. 10. 2020
Reading time: 5 minutes
The advent of digitisation brought with it the need to make constant decisions – quickly, and most importantly correctly. The more we can manage this automatically, the better we can face hidden threats and utilise new opportunities. While generating a profit, of course. The technologies and data which were more than sufficient for previous generations of our processes are no longer enough for the new generations. That's why it's necessary, as much as possible, to use modern technologies which work with correct data, and which are available in time, i.e. not tomorrow, or in a week's time, or never.
Real-time decision-making is based on technical formalization, and the automated resolution of the specific problem. This solution must behave deterministically, and must be powerful, reliable and scalable; in short, it must be all strengths and no weaknesses. Only such a solution can be used in a wide range of tasks – from small improvements at the level of partial interaction with the customer via digital channels, through various types of control mechanisms which ensure security, to the comprehensive management of critical business processes without the need for human input.
Only a few years ago, this wasn't technically possible, but new technologies and growing computing power make it feasible today. Some visionaries even talk of fully digital members of companies' boards and middle management, which would completely replace the sophisticated work of living people. There are no limits to imagination and innovation. On the other hand, the human mind generally overestimates technological potential in the short term, and conversely underestimates it in the long term.
If we're considering creating a real-time decision-making solution, what will we need for it? Primarily some practical task, for which fast decision-making means a fundamental competitive advantage or, even better, direct profit. Fast decision-making for its own sake isn't the best idea, because given the solution's relatively high costs it will never pay off. We must never forget the financial aspects of these types of tasks. The chosen use should always have clearly measurable financial and non-financial KPIs (Key Performance Indicators).
What we should definitely never do is start building a decision-making solution from technologies, and only then look for a suitable task. Unfortunately we encounter this procedure relatively frequently in practice. Technologies suitable for real-time decision-making are excellent and amazing. They're the dream of every technology fan who deals with data at least to some extent. But in most cases, an approach which strives for technological perfection first and foremost paradoxically leads to the creation of solutions which are problematic from a business perspective, whereby costs exceed any sales.
After defining the task(s) to be resolved, we can begin to address the architecture itself. In the architecture, we shouldn't underestimate any of its components. Real-time decision-making must be based on data and the appropriate decision-making algorithms. In the case of real-time decision-making, the data and work with it resemble a river: the data originates (springs from) somewhere; there are places it must flow through and places where it's collected, in some places it's used (for drinking, watering plants, turning turbines), and there's a place where it ceases to exist. And most importantly: just like a river, we cannot stop this type of data flow and wait until another time, until there are better or different conditions. For these data flows, the decision-making algorithms function analogically like floodgates, which not only enable flow optimisation, but also facilitate the conversion of the power of the water flow into something beneficial.
Logical architecture of the platform for real-time decision making
From a technological perspective, real-time decision-making is divided into several layers.
1. Data acquisition
The first area is the acquisition and receipt of data from data sources. Data sources can be relational databases, data streams, files, data replications, various types of signals from IoT devices etc. When using unstructured data, it's necessary as much as possible to use Edge Computing, which reduces and prepares data for use in a decision-making solution. Technically, at this level, we're talking about technologies for Change Data Capture, ETL for data transfer, streaming, messaging, distributed file systems, API etc. The greatest risk at this level are Legacy systems, which are often very difficult to integrate and continuously obtain data from.
2. Data transformation
The second layer should ensure data transformation. It should support its various processing frequencies, from real-time in the form of Data Pipeline, through micro-batch, to classic batch processing, so that the processing is efficient, fast and reliable. Data transformations should service and populate all data repositories, regardless of whether they're databases or data streams. However, the data transformations should definitely not forcefully disrupt data flow continuity.
An example of such an undesired disruption is the arrival of a message from the data stream, its transformation into database structures, and the subsequent generation of a practically identical new message based on the change in the database structure. So how can it be done better? Messages should be stored in the rawest possible form, enriched with static data (can be realised dynamically), and then directly processed. In the case of more complex logic with multiple messages, it's possible to use older and verified technologies for Complex Event Processing, or the corresponding functionalities of the newer Stream Processing.
The greatest risk in this area is the flooding of the transformation layer with large volumes of data which is generated suddenly, for example during the daily financial statements. Due to these spikes, it's necessary to implement multiple optimised technical transformations for the same transformation logic, and dynamically switch between them as necessary. To ensure the consistency of technical transformations, it's advisable to use a solution such as Metadata Drive Development, with which it's possible – with the help of logical transformation metadata and a set of optimised technical transformation templates – to easily and quickly achieve the desired result.
3. Data repository
The penultimate layer is the data repository. Historically, data repositories were referred to as Operational Data Store (ODS). The current generation of ODS underwent a very long evolution, including a certain shift in its role in architecture, so today they're often referred to by newer names such as Data Hub or Digital Integration Hub. A data repository should have the character of a database, for both static data in the form of relational structures primarily containing master data or documents, and for dynamic data in the form of message queues. An optimal combination is an RDBMS hybrid repository with a streaming platform. RDBMS manages shared master data or other static data sets. The streaming platform services both the data flow itself and its integration.
4. Distribution layer
The last layer is the distribution layer, which is primarily resolved using API. API is divided into two groups: data API and API for decision-making. Data API enables the use of unified integrated data even outside the decision-making context. API for decision-making creates envelopes for models formed with the help of machine learning and artificial intelligence. These models' learning is based on the data repository, and the development of the models is completely beyond the scope of this article.
It may be a somewhat surprising finding that, even in the case of real-time decision-making, the universal Pareto principle applies whereby 80% of the effort must be devoted to data preparation, and only 20% is about the real-time decision-making itself. Operational reporting, focused on decision-making, also falls within the distribution layer.
Of course, we mustn't forget that every real-time decision-making solution must be integrated for the company's process and technical levels. Data must enter the solution, and the solution must provide data and decision-making services; even more importantly, it must make correct decisions. We must also remember to define an appropriate Governance for the entire solution.
In information management, real-time decision-making is a challenge. In our experience, the building of this type of solution has two basic pillars:
Only if both are successfully implemented can the desired results be achieved, and can truly qualified and automated real-time decision-making take place.
Article by Martin Bém - Senior Data Architekct - was published in CIO BusinessWorld vol. 10/2020.
We will contact you as soon as possible.