This is precisely what the Internet service provider realized, thus facing a big challenge. The company opened a tender to find a solution for storing large volumes of metadata about network traffic, placing a strong emphasis on the new solution’s smooth operation and high accessibility. The ISP approached previous and new suppliers, including Adastra, which has extensive experience in Big Data from a wide range of projects implemented successfully in various sectors.
The installation and implementation of the Hadoop platform took 3 months, during which time testing was also carried out. Adastra’s suggestion worked well from the beginning, so there was no need to make any significant changes, and the new Big Data platform was put into operation a month earlier than originally planned.
Adastra came up with a purely generic data platform that delivers excellent computing power and whose disk capacity can easily be extended according to the client’s future needs. Taking operating costs into account, Adastra suggested a smaller cluster, which fully complies with current requirements, provides high computing power, and has sufficient storage capacity. The solution delivers close to 1PB (petabyte) of space, processes 300 compute threads, has 2.5TB of RAM, and uses Hortonworks distribution. The network traffic metadata is processed on the Spark framework, which uses stable technologies such as Apache Kafka, Apache Hadoop and Apache HBase.
The new Big Data platform processes all the data related to the client’s Internet traffic. It enables flexible cluster-resource allocation depending on the required data flow. Real data flows are in the tens of billions every day. Spark will calculate the basic daily aggregation within 12 minutes when allocating 100 compute threads.
The solution also includes a third-party component for the high-performance conversion of network metadata from probes, which was successfully tested on the client’s system during the PoC (proof of concept).