01. 11. 2015

Do You Really Need Clean Data or Is It An Unnecessary Luxury?

Reading time: 5 minutes

Company management usually stand behind the quest for higher quality data.


Data drives not only the IT world. The rising influence of mobile communications and the advent of new technologies have brought an increase in the volume of data being managed by 25–50% in 2014. The benefits managing it are slowly being discovered by administrations, marketing firms, insurance companies, banks and telecommunication companies. Yet the large data segment is still cloaked in uncertainties. Does it also apply here that quality triumphs over quantity? Is the huge amount of data at all correct and effectively used? And it is even worth investing in the quality of information?

Information rules the world. It manages businesses and companies as well, because on the basis of it management takes decisions. However, access to data still is not in and of itself a victory. With the growing volume and number of resources, there proportionally increases the risk of erroneous data occurring and chaos when administrating it. There are duplicates and systems set to one data format of information across multiple sources do not pair as they should.

So the question arises whether in such cases the volume of data is at all helpful. Apparently not. Or its potential is simply not fulfilled. Does your company have access to accurate and complete information at the precise moment when you need it? No? Then it is a clear sign that currently managing your often hard-won data unnecessarily. Owning the data is not the ending, but rather the beginning. It is necessary to nurture, maintain and ideally increase data quality by cleaning it, which means Data Governance.

However, it is important to note that when comparing the costs and benefits of Data Governance in terms of the quality of data, only the portion of data is relevant that can be identified and evaluated through the identification of key processes. That is to say the first level of working with data is knowledge of key processes. If we know it, we can analyse it, identify data requirements and identify incorrect data. The second level then is the optimization process itself, which, although they correct data is available, the process does not work properly (or supplementary data are not available), and the process works poorly.

It is all a question of needs…

In the segment of Data Governance is not possible to even answer the question of whether you should invest in data quality or not. Just as in the manufacturing or trade, the answer is clearly yes. Product quality, sales, and information should be an absolute standard. Far more interesting is the contrary, the impact of poor quality information. Qualified estimates indicate costs of 22% – 35% of operating profit.

Just quantifying how much poor information actually costs a company, is part of the service providers, the Data Governance systems. Saving money lies not only in the effective work of the company and management decisions, but also in planning of marketing campaigns and a general reduction of erroneous acts, which reduces complicated repairs. For example, because of the seemingly marginal ambiguity in the data on clients one company did not reveal a link between specific clients and insured events in time. The company failed to disclose a lot of the fraud at all, although this data was available; it was just is not properly organized.

Another centrally controlled chaos is a well-known example of how far the incorrect data management very significantly affecting the end-user, the recent failure of the Central Vehicle Register, which contains information on 7.5 million vehicles in operation. Due to an improperly adjusted data management system, the whole database collapsed. And the result? Queues of angry clients for many hours irreparably damaged the company’s repu­tation and paid exorbitant sums for compensation to clients who were damaged.

Poor quality information can lie in an inappropriate form of registration, but it could be as simple as a typo in the name or address, for example, that a bank cannot pair information on what products the customer uses and thus it may miss a marketing campaign. Or, on the contrary, part of the campaign too. A well-known example is a company, which due to errors in the database also addressed long-dead clients. In these cases then, such errors are not only economic, but also have moral, ethical and sometimes legal impacts.

How and when to start cleaning data? After Data Governance systems frequently extend to information-based organizations that possess large amounts of data are for their core business. Firstly, it is necessary to establish specific responsibilities for the quality of information. It is not enough in the company to adopt a resolution that “from now on we will clean data”. Senior management and employees at the lowest positions must be fully aware of the importance of quality data, and to the same extent.

For example, a certain unnamed bank long struggled with a shortcoming when implementing Data Governance, when responsibility for the data gradually was transferred from the management to the staff at the branches. Instead of concentrating on selling products, bankers would update its data at every meeting with the client (address, age, marital status, etc.). Finally, the solution was staff training, issuing directives and organizational change rules so it was clear who and how data is collected.

A Data Dictionary is absolutely a crucial part of implementing Data Governance, as are regulations and the necessary metrics, but also builds a definition and description of all available data, i.e. a Data Dictionary. It is not only a description of the data in the database, but also on the business context of each item.

The most valuable (and labour intensive from the perspective of compliance across the company) is precisely the definition of “business” with which users will work; in other words, the know-how of the company.

These are followed by mapping the commercial significance of the aforementioned description of data structures in the various systems (sometimes it is even necessary to go down to the processes of buying data). Up to this step, we find that the data corresponds to what outputs are expected from them. There is a precise definition of what the title of the box in the database contains or does not contain. By all accounts it is a very challenging task, but the more important one. Thanks to the data dictionary, corporate know-how is comprehensive and understandable and uniform for all employees and will be clear for future employees.

Purchasing a system definitely takes time, effort and money. But the company reduces the number of mistakes and thus their repairs and obligations on compensation, which negatively impact the company’s repu­tation.

The path for clean data is faced with indecisive management or organizational changes, implementing methodologies, determining responsibility and compiling dictionaries of thousands of items, or gathering tools and technologies; all in the hands of one thing – the decision by the management. In many companies, it is a process of several months.

Why even opt for cleaning at all? Traditional-custom development solutions primarily do not have the necessary capacity and monitoring of data quality. Thus, there are no sources of potential clients. On the contrary, Data Governance has a significant effect on marketing and thus business, allowing management and employees to clearly see whether the clients have certain products or not detect fraud, for example, including planning campaigns.

Better late than never largely does not unconditionally apply in the Data Governance segment. If companies do not address the quality of data when making or processing data, this error will almost certainly turn up later in the final process, which uses data. Management often reaches for the quickest and easiest solution; acquired data is corrected immediately before use in the final process. The problem is that so-called poor quality data subsequently appears. More and more new erroneous data flows into the system. So if the decision is taken to remedy it, always take action where there is primarily low quality data.

Data quality should be more or less in the interest of all companies. At least the level of the design, how to get the data and how to work with it should be taken into account. Automatic solutions will be useless especially for larger companies. For them, more data systems are usually generated, so there is a greater scope for possible inconsistencies. A problem occurs when data from these systems are perceived as a single version of the truth.

Author: Jaroslav Tykal, Master Data Management Consultant at Adastra, s.r.o.