2. About hosts(inventory file)

This project provides two types of Ansible hosts sample. The following is the location of the main services.

hosts.medium_sample:

  • Master Services including NameNode and ResourceManager: 3 nodes

  • Client: 1 node

  • Slave: 5 nodes

  • Manage: 1 nodes

  • Kafka: 3 nodes

hosts.large_sample:

  • NameNode: 2 nodes

  • Zookeeper and JournalNode: 3 nodes

  • ResourceManager: 2 nodes

  • Client: 1 node

  • Slave: 10 nodes

  • Manage: 1 nodes

  • Kafka: 3 nodes

3. About groups in inventory

This example of inventory includes the following configured groups.

Main group

group

description

production

The top group which represents the whole of the environment, such as data centers. This group is used to define the environmental specific parameters.

local

The dummy group to define localhost in inventory.

hadoop_all

The group whih represents the whole of Hadoop cluster. This group includes all groups and nodes in the Hadoop cluster.

hadoop_master

This group represents all master nodes.

hadoop_namenode

This group represents the primary NameNode and the backup NameNode

hadoop_journalnode

This group represents JournalNodes.

hadoop_zookeeperserver

This group represents Zookeeper nodes. Important : The parameter “zookeeper_server_id” is configured with each nodes.

hadoop_resourcemanager

This group represents ResourceManagers

hadoop_other

This group represents nodes which provide Hadoop-related services, such as HistoryServer.

hadoop_slave

This group represents slave nodes.

hadoop_client

This group represents client nodes. The client nodes are used to execute commands to access Hadoop services and other related services.

hadoop_pseudo

This group represents a node which provides Hadoop pseudo environment. This is mainly used for the application development.

manage

This group represents nodes which provides the management services, such as Ganglia and Graphite.

kafka_cluster

This group represents Kafka brokers of Apache Kafka (Community version)

confluent_kafka_cluster

This group represents Kafka brokers of Confluent Kafka

confluent_schema_registry

This group represents Confluent’s schema registry service nodes

confluent_kafka_rest

This group represents Confluent’s REST Proxy serivce nodes

data_loader

This group represents nodes which provide the services to load data to the cluster. e.g. fluentd and td-agent

endosnipe

This group represents nodes which provide EndoSNipe servides, such as a dashbord.

heapstats

This group represents nodes which use heapstats to monitor JVM processes.

3.1. Managing several clusters

If you want to manage several Hadoop clusters or environments, you can distinguish these environments by using different inventries which have different top-level groups.

e.g. the production environments, the test environments, the development environments and so on.

The group variables of each group define parameters specific to each environment.

Example

  • group_vars/all/something … This file provides default parameters common for all environments.

  • group_vars/production/something … This file provides parameters common for the production environments.

  • group_vars/test/something … This file provides parameters common for the test environments.