Hadoop Project
It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability.
the library itself is designed to detect and handle failures at the application layer , so dedivering a highly-availabe service on top of a cluster of computers.
The project includes these modules:
Haddop COmmon: The common utilities that support the other Hadoop modules.
Hadoop Distribuated File System(HDFS): Adistributed file system hat provides high-throughput access to application data.
Hadoop YARN: A framework for job scheduling and cluster resource management.
Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
Other Hadoop-related projects at Apache include:
Ambari: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase,
ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce,
Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner.
Current version: 2.6.2
* sequenceiq/hadoop-docker
hadoop-cluster-docker
How to build multi-node Hadoop Cluster based on Docker
[How to build multi-node Hadoop Cluster based on Docker](http://cloud.51cto.com/art/201505/477851_all.htm)
In this projects, I developed 4 docker iamges: serf-dnsmasq, hadoop-base, hadoop-master, hadoop-slave
* serl-dnsmasq
Base on ubuntu:15.04, serf and dnsmasq are installed for providing DNS service for the Hadoop Cluster
* Hadoop-base
Base on serf-dnsmasq, openjdk, openssh-server, vim and Hadoop 2.3.0 are installed
* hadoop-master
Base on hadoop-base . Configure the Hadoop master node
* hadoop-slave
Based on hadoop-base. Configure the Hadoop slave node.
Steps to build a 3 nodes Hadoop cluster
1. pull images
sudo docker pull kiwenlau/hadoop-master:0.1.0
sudo docker pull kiwenlau/hadoop-slave:0.1.0
sudo docker pull kiwenlau/hadoop-base:0.1.0
sudo docker pull kiwenlau/serf-dnsmasq:0.1.0
2. clone source code
git clone https://github.com/kiwenlau/hadoop-cluster-docker
3. run container
cd hadoop-cluster-docker
./start-container.sh
4. test serf and dnsmasq service
In fact, you can skip this step and just wait for about 1 minute. Serf and dnsmasq need some time to start service. you can wait for a while if any nodes don't show up sice serf agent need time
to recognize all nodes
list all nodes of hadoop cluster
serf members
ssh slave2.kiwenlau.com
5. start hadoop
./start-hadoop.sh
6. run wordcount
./run-wordcount.sh
Steps to build arbitrary size Hadoop cluster
1. preparation
check the steps 1~2 of section 1: pull images and clone source code
2. rebuild hadoop-master
./resize-cluster.sh 5
3. start container
./start-container.sh 5
you'd better use the same parameter as the step b
4. run the hadoop cluster
check the steps 4~6 of section1: test serf and dnsmasq, start Hadoop and run wordcount
plase test serf and dnsmasq service before start hadoop