ansible-hadoop

An Ansible role which can be used to create a multi-node Apache Hadoop cluster for development purposes.

Features

Deploy Datanodes
Deploy NameNodes (single or HA)
Deploy Manager Nodes (ResourceManager, NodeManager, HistoryServer)
Offline or online install
Download sample JAR which can be used to execute teragen, terasort jobs
Default install dir: /opt/hadoop

Sample Usage

The role relies on specific group names. A sample inventory file could look like this:

# Sample inventory.ini
[hadoop_namenodes]
node-1

[hadoop_datanodes]
node-1
node-2

[hadoop_managers] 
# these hosts will run: ResourceManager, NodeManager and MapReduce Job History server
node-1

[hadoop:children]
hadoop_namenodes
hadoop_datanodes
hadoop_managers

[hadoop:vars]
# If set to True all existing data (if any) will be deleted
hadoop_reformat_namenode=True

# Shared Edits Dir is needed for HA if multiple NameNodes are specified, so use only if more than one NameNode is specified
# This directory must be a shared directory and accessible from all the NameNodes 
# (e.g. you have to set up a NFS share and mount it on all the NameNodes)
# hadoop_hdfs_site={"dfs.namenode.shared.edits.dir": "/mnt/gpfs0/hdfs-ha"}

# Online install, each node needs internet access
hadoop_distro_url=https://www-eu.apache.org/dist/hadoop/common/hadoop-3.1.3/hadoop-3.1.3.tar.gz
hadoop_jre_url=https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u222-b10/OpenJDK8U-jre_x64_linux_hotspot_8u222b10.tar.gz
hadoop_sample_jar_urls=['https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-examples/3.1.3/hadoop-mapreduce-examples-3.1.3.jar']

# For offline install download JRE and Hadoop distro manually, place them on the ansible host and uncomment the following vars
#hadoop_distro_local_pkg=/root/hadoop-3.1.3.tar.gz
#hadoop_jre_local_pkg=/root/OpenJDK8U-jre_x64_linux_hotspot_8u222b10.tar.gz
#hadoop_sample_jar_pkgs=['/root/hadoop-mapreduce-examples-3.1.3.jar']

# Sample playbook
---
- hosts: hadoop
  remote_user: root
  gather_facts: no
  roles:
    - andiveloper.hadoop

ansible-galaxy install andiveloper.hadoop
ansible-playbook -i inventory.ini playbook.yml

Testing your cluster setup

If the playbook finished successfully you can run a sample job using the downloaded sample JAR, for example:

su hdfs

export PATH=$PATH:/opt/hadoop/current/bin

# Basic funtionality:
hdfs dfs -ls /
hdfs dfs -mkdir -p /user/hdfs

# Teragen + Terasort job (generating 1GB of data)

OUTPUT_DIR=/performance

hdfs dfs -rm -r -f $OUTPUT_DIR

time yarn jar /opt/hadoop/samples/hadoop-mapreduce-examples-3.1.3.jar teragen -Dmapreduce.jobs.maps=128 -Ddfs.blocksize=128M 10000000 $OUTPUT_DIR/teragen_1G_128maps

time yarn jar /opt/hadoop/samples/hadoop-mapreduce-examples-3.1.3.jar terasort   -D mapreduce.map.output.compress=true  $OUTPUT_DIR/teragen_1G_128maps $OUTPUT_DIR/teragen_1G_128maps_sorted_1

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
defaults		defaults
meta		meta
tasks		tasks
templates		templates
tests		tests
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ansible-hadoop

Features

Sample Usage

Testing your cluster setup

About

Uh oh!

Releases

Packages

Uh oh!

License

andiveloper/ansible-hadoop

Folders and files

Latest commit

History

Repository files navigation

ansible-hadoop

Features

Sample Usage

Testing your cluster setup

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages