2. About hosts(inventory file)¶
This project provides two types of Ansible hosts sample. The following is the location of the main services.
hosts.medium_sample:
Master Services including NameNode and ResourceManager: 3 nodes
Client: 1 node
Slave: 5 nodes
Manage: 1 nodes
Kafka: 3 nodes
hosts.large_sample:
NameNode: 2 nodes
Zookeeper and JournalNode: 3 nodes
ResourceManager: 2 nodes
Client: 1 node
Slave: 10 nodes
Manage: 1 nodes
Kafka: 3 nodes
3. About groups in inventory¶
This example of inventory includes the following configured groups.
Main group
| group | description | 
|---|---|
| production | The top group which represents the whole of the environment, such as data centers. This group is used to define the environmental specific parameters. | 
| local | The dummy group to define localhost in inventory. | 
| hadoop_all | The group whih represents the whole of Hadoop cluster. This group includes all groups and nodes in the Hadoop cluster. | 
| hadoop_master | This group represents all master nodes. | 
| hadoop_namenode | This group represents the primary NameNode and the backup NameNode | 
| hadoop_journalnode | This group represents JournalNodes. | 
| hadoop_zookeeperserver | This group represents Zookeeper nodes. Important : The parameter “zookeeper_server_id” is configured with each nodes. | 
| hadoop_resourcemanager | This group represents ResourceManagers | 
| hadoop_other | This group represents nodes which provide Hadoop-related services, such as HistoryServer. | 
| hadoop_slave | This group represents slave nodes. | 
| hadoop_client | This group represents client nodes. The client nodes are used to execute commands to access Hadoop services and other related services. | 
| hadoop_pseudo | This group represents a node which provides Hadoop pseudo environment. This is mainly used for the application development. | 
| manage | This group represents nodes which provides the management services, such as Ganglia and Graphite. | 
| kafka_cluster | This group represents Kafka brokers of Apache Kafka (Community version) | 
| confluent_kafka_cluster | This group represents Kafka brokers of Confluent Kafka | 
| confluent_schema_registry | This group represents Confluent’s schema registry service nodes | 
| confluent_kafka_rest | This group represents Confluent’s REST Proxy serivce nodes | 
| data_loader | This group represents nodes which provide the services to load data to the cluster. e.g. fluentd and td-agent | 
| endosnipe | This group represents nodes which provide EndoSNipe servides, such as a dashbord. | 
| heapstats | This group represents nodes which use heapstats to monitor JVM processes. | 
3.1. Managing several clusters¶
If you want to manage several Hadoop clusters or environments, you can distinguish these environments by using different inventries which have different top-level groups.
e.g. the production environments, the test environments, the development environments and so on.
The group variables of each group define parameters specific to each environment.
Example
- group_vars/all/something … This file provides default parameters common for all environments. 
- group_vars/production/something … This file provides parameters common for the production environments. 
- group_vars/test/something … This file provides parameters common for the test environments.