.

Data integration

We integrate data from a wide variety of sources, always keeping data governance in mind

We integrate data from many different data sources for the purposes of unification and consolidation, in order to make the data available and facilitate reporting. Throughout the integration process, we comply with the data governance and data anonymization required by GDPR as well as the organization’s internal policies.

We usually handle data from the following sources

None

Public resources

None

Database systems

None

File-based storage

None

Cloud storage

The benefits of data integration

Topic

How we work / what we do

Benefits for customers

Advanced Data Analytics

Data integration makes it possible to merge relevant data, transform them, and gain a better overview of the data as a whole

Customers gain unlimited options for advanced data analytics

Data Quality

Data integration facilitates standardization, cleaning, enrichment and validation

We improve data quality and deliver reliable and clear reports to end users

Data Governance

As part of the data integration process, we also standardize and centralize the metadata

We ensure comprehensive data governance

Data Anonymization

We can mask the integrated data using advanced tools

Data are anonymized according to the specific needs of the organization/department. We can implement anonymization during data transfers as well as in any layer of the target storage

Security

We focus on security when storing data (in order to prevent unauthorized access), and when making them available on the platform

We ensure data security in all environments and layers, and for all user roles, in accordance with the company’s data governance

Data Classification

We classify data for each data source, starting from the lowest level. In practice, this means that different datasets from the same data source may be classified separately

We create and build different environments that only authorized users can access

Automated Data Orchestration

We apply automated orchestration to guarantee operations are performed regularly and smoothly

Integration workflows remain consistent, according to predefined rules

Data integration workflow

During the data integration process, we analyze the source data and the output that has been requested. At the same time, we process and supplement the metadata, which is then used for the data integration.

The integrated data are usually transferred into several layers, and in each of those, the data may serve a different purpose and take a different form.

None
Top layer - Data mart
  • The data is prepared and cleaned for data analysis, reporting and machine learning
  • Based on user needs, multiple data sets can be consolidated into one data mart
  • One or more layers may be added between the top layer and layer 2. If required, data aggregation and deduplication can be carried out here 
  • Permissions: other users who want to download and report the final data may be granted access

Special environment – Group & Personal workspace
  • This is a test environment for individuals or for a specific team or project
  • It is completely separated from the standard environments with regard to permissions and development support

Layer 2 - Optimized
  • Initial data transformations, and re-typing them to target data types according to the metadata, are performed here 
  • Permissions: in addition to administrators and integration developers, select end users may also have access so that they can already verify data correctness in the second layer

Layer 1 - Landing
  • The data are stored here in the same format as they were ingested
  • This layer is used to check whether the data were damaged during transmission
  • Permissions: only administrators and integration developers can access the data in this layer

Data integration technologies

None

Data acquisition

  • We transfer data from the source to the target storage using, for example, Adoki, Spark, Kafka, etc.
  • We can acquire data using batch, micro-batch, or real-time processes
None

Data processing

  • Data processing is carried out via integration workflows
  • We automate these workflows using orchestration tools, for example Airflow

Metadata management

We store the metadata for each integrated resource

None

in the organization’s shared storage

(GitHub, GitLab, Azure DevOps, etc.)

None

in metadata databases

(SQL databases, NoSQL databases)

A metadata include

None

A general description of the data

data classification
  • internal, public, confidential, secret
the owner and technical contacts
  • the integration developer or data analyst
None

A description of datasets

the table and column names and comments,
data types, and other technical dependencies such as
  • the names of the data sources and
  • how they are connected to the target system, etc.
None

related CI/CD pipelines

These automatically design data workflows. Once the workflow has been created, the entire process is automated and data can be acquired 
  • automatically at a predefined time or intervals
  • whenever new data are indicated

References

Automotive: Integrating 8 IoT databases and 1 metadatabase to reduce load and save space in the source system

None

We have enabled a large automotive company to work efficiently with (IoT) sensor data from manufacturing. At the same time, we have lightened the system load and introduced data retention in the source system.

  • We have set up and now manage replications of 8 databases and 1 metadatabase with complete history, which has saved significant space in the source system.
  • For such replications, we use Adoki, Adastra’s replication tool, which
    • replicates all databases reliably and quickly,
    • cleans the data, removing inaccuracies, during the replication phase,
    • processes the data into the correct format.


The car manufacturer gained the required space savings in the source production system, the system load was reduced, and 12-month data retention was implemented.

Automotive: CRM system integration for more complex reports with an emphasis on anonymizing sensitive data

For one large automotive company, we have improved the data quality of customer data, integrated individual CRM modules into a unified data solution, and further enriched it with data from public registers.

  • We designed and built Spark data marts that are now consumed by business users via Power BI.
  • We are able to process more customer data and create more complex reports.
  • As a result, advanced data analysis is performed on both raw and aggregated data across all CRM domains.


Throughout the solution, we have emphasized anonymizing sensitive data, with some data anonymization already being carried out during acquisition.

None

Automotive: Integrating JIRA data to identify risk

None

By integrating data directly from the JIRA source system, we are able to prepare a detailed overview of the status of multiple projects, including all their subtasks and timesheets, for a large automotive company.

  • We have developed an application that uses Adoki to download JIRA data every day, transform them into the correct format, and upload them to the target system.
  • We are further analyzing and visualizing the integrated data in reports that allow the customer to
    • monitor project statuses overall and in detail,
    • monitor progress on their subtasks,
    • detect risk areas down to the timesheet or project-financing level.

As we are able to determine risks early, we can give the persons responsible on the customer’s side advance warning of any potential dangers or complications.

Manufacturing: integrating public registers to improve data quality

At a large manufacturing company, we have automated data uploads from public registers. This has eliminated errors caused by freely, manually entering data on company addresses and executives.

  • We decided on automated data uploads from public registers.
  • We set up regular integrations – daily data uploads and daily checks.
  • We continuously verify that customer input data are accurate and up to date.
None

Do not miss

Blog

Interested in a solution tailored to your needs? Contact us today.

Thank you

We will contact you as soon as possible.

Dagmar Bínová

Big Data & Data Science Team Lead

Tomáš Plánička

Big Data Solution Architect