2. About hosts(inventory file)¶

This project provides two types of Ansible hosts sample. The following is the location of the main services.

hosts.medium_sample:

Master Services including NameNode and ResourceManager: 3 nodes

Client: 1 node

Slave: 5 nodes

Manage: 1 nodes

Kafka: 3 nodes

hosts.large_sample:

NameNode: 2 nodes

Zookeeper and JournalNode: 3 nodes

ResourceManager: 2 nodes

Client: 1 node

Slave: 10 nodes

Manage: 1 nodes

Kafka: 3 nodes

3. About groups in inventory¶

This example of inventory includes the following configured groups.

Main group

group	description
production	The top group which represents the whole of the environment, such as data centers. This group is used to define the environmental specific parameters.
local	The dummy group to define localhost in inventory.
hadoop_all	The group whih represents the whole of Hadoop cluster. This group includes all groups and nodes in the Hadoop cluster.
hadoop_master	This group represents all master nodes.
hadoop_namenode	This group represents the primary NameNode and the backup NameNode
hadoop_journalnode	This group represents JournalNodes.
hadoop_zookeeperserver	This group represents Zookeeper nodes. Important : The parameter “zookeeper_server_id” is configured with each nodes.
hadoop_resourcemanager	This group represents ResourceManagers
hadoop_other	This group represents nodes which provide Hadoop-related services, such as HistoryServer.
hadoop_slave	This group represents slave nodes.
hadoop_client	This group represents client nodes. The client nodes are used to execute commands to access Hadoop services and other related services.
hadoop_pseudo	This group represents a node which provides Hadoop pseudo environment. This is mainly used for the application development.
manage	This group represents nodes which provides the management services, such as Ganglia and Graphite.
kafka_cluster	This group represents Kafka brokers of Apache Kafka (Community version)
confluent_kafka_cluster	This group represents Kafka brokers of Confluent Kafka
confluent_schema_registry	This group represents Confluent’s schema registry service nodes
confluent_kafka_rest	This group represents Confluent’s REST Proxy serivce nodes
data_loader	This group represents nodes which provide the services to load data to the cluster. e.g. fluentd and td-agent
endosnipe	This group represents nodes which provide EndoSNipe servides, such as a dashbord.
heapstats	This group represents nodes which use heapstats to monitor JVM processes.

3.1. Managing several clusters¶

If you want to manage several Hadoop clusters or environments, you can distinguish these environments by using different inventries which have different top-level groups.

e.g. the production environments, the test environments, the development environments and so on.

The group variables of each group define parameters specific to each environment.

Example

group_vars/all/something … This file provides default parameters common for all environments.
group_vars/production/something … This file provides parameters common for the production environments.
group_vars/test/something … This file provides parameters common for the test environments.