You Can contact us by filling the form below:

Thank you, your message was sent successfully!

Please, wait our respond.


Sorry, the error has occured.

Are you a robot?

You Can contact us by filling the form below:

Thank you, your message was sent successfully!

Please, wait our respond.


Sorry, the error has occured.

Are you a robot?

Hadoop, an Open-source Software Framework

Hadoop is a software framework which is used for storing large amounts of data and running different applications on clusters of computers. The open source framework is written in Java and allows processing data with the use of simple programming models. Each module is responsible for carrying out a specific task essential for a computer system that was designed for big data analytics. With Hadoop, users can quickly write distributed systems and test them.

 

Hadoop framework consists of 4 principle modules:

 

  • Hadoop Common includes Java libraries and other tools required by other framework modules. The libraries contain Java scripts and files that are necessary to start Hadoop.
  • Distributed File-System ensures high-speed access to application data. This module allows to store data across a large number of connected storage devices in a format that ensures easy access.
  • Hadoop YARN is a framework for management of cluster resources and scheduling jobs.
  • MapReduce is a system that is based on YARN and is used for parallel processing of big datasets. It performs two basic procedures – reading database’s data and organizing it for analysis and doing calculations and other mathematical operations.

 

There are also additional software packages such as Apache Spark, Apache HBase, Apache Hive, Apache Pig etc. They can be installed on top of Hadoop or alongside it.

 

Hadoop can work with a variety of distributed file systems such as S3 FS, HFTP FS, Local FS, and other but the Hadoop Distributed File System is used most commonly. This distributed file system is reliable and can run on large clusters of computers. It features a master/slave architecture when a single NameNode manages metadata of the file system and slave DataNodes store the current data.

 

The MapReduce framework has a single master JobTracker and one slave TaskTracker in every cluster node. Master manages resources, tracks their consumption and availability, schedules job tasks on the slaves and controls them.

 

Since Hadoop is based on Java, it is compatible on all other platforms.