Building a new Big Data platform processing tens of billions of data every day? With Adastra done in 3 months!

records per second

to built a versatile Big Data platform

Managing the network and data traffic of a major Internet service provider in the Czech Republic is not an easy task; it’s based on real data stored in real time.

The volumes of transmitted data are enormous – a perfect use case for Big Data technologies. But what if you don’t have any experience with these?

New Big Data platform can handle large volumes of network traffic metadata

This is precisely what the Internet service provider realized, thus facing a big challenge. The company opened a tender to find a solution for storing large volumes of metadata about network traffic, placing a strong emphasis on the new solution’s smooth operation and high accessibility. The ISP approached previous and new suppliers, including Adastra, which has extensive experience in Big Data from a wide range of projects implemented successfully in various sectors.

New platform for Internet traffic: 1 petabyte storage capacity, 300 compute threads and 2.5 terabytes of RAM

The installation and implementation of the Hadoop platform took 3 months, during which time testing was also carried out. Adastra’s suggestion worked well from the beginning, so there was no need to make any significant changes, and the new Big Data platform was put into operation a month earlier than originally planned.

Impossible on demand: New Big Data platform goes live a month early

Adastra came up with a purely generic data platform that delivers excellent computing power and whose disk capacity can easily be extended according to the client’s future needs. Taking operating costs into account, Adastra suggested a smaller cluster, which fully complies with current requirements, provides high computing power, and has sufficient storage capacity. The solution delivers close to 1PB (petabyte) of space, processes 300 compute threads, has 2.5TB of RAM, and uses Hortonworks distribution. The network traffic metadata is processed on the Spark framework, which uses stable technologies such as Apache Kafka, Apache Hadoop and Apache HBase.

The new Big Data platform processes all the data related to the client’s Internet traffic. It enables flexible cluster-resource allocation depending on the required data flow. Real data flows are in the tens of billions every day. Spark will calculate the basic daily aggregation within 12 minutes when allocating 100 compute threads.

The solution also includes a third-party component for the high-performance conversion of network metadata from probes, which was successfully tested on the client’s system during the PoC (proof of concept).

Reporting

The Big Data platform is an excellent basis for follow-up activities and development: a complete reporting layer can effectively be built on top of the stored data, and information can be accessible and visualized for end users.

Maximum performance

The client can handle yearly increases in data volume simply by adding several additional cores in the streaming application on YARN. Increasing the number of cores makes it possible for storage flow to reach close to 1 million records per second. Thus, the client can flexibly respond to current needs in terms of both data traffic and computing power required for Advanced Analytics and connected Machine Learning.

Versatility

The solution isn’t tied to a specific hardware manufacturer or server type. The cluster can be supplemented by any type of server depending on availability or by specialized servers with GPU for accelerating Machine Learning algorithms.

24/7 support

Adastra provides 24/7 support for operations and troubleshooting.

Case studies

Equa Bank clients were fully migrated to Raiffeisenbank in 12 hours

When Equa Bank was being merged into Raiffeisenbank in November...

Read more

Automatic categorization for 98.5% of card transactions

With millions of clients conducting millions of operations every day,...

Read more

Get inspired on our blog

Adastra Group’s Revenue Exceeds 5 Billion CZK for the First Time

In 2022, Adastra Group's revenue reached 5.6 billion CZK, marking the first time in the company's history that it surpassed the 5 billion CZK...

Read more

Adastra becomes a strategic partner of Škoda Auto University. Its specialists will teach data management, data science, and artificial intelligence courses

Adastra becomes a long-term strategic partner of Škoda Auto University. Its experts will be involved in teaching data management, data science, and artificial intelligence...

Read more

Unraveling the Future: Top 8 Data Management Trends for 2023 and Beyond 

The digital landscape is evolving at an unprecedented pace, and with it comes a new set of challenges and opportunities for IT professionals, data...

Read more