1. Abstract¶
1.1. About playbooks¶
This is a library of playbooks to construct HDFS/YARN clusters with some kinds of Big Data tools, such as Apache Spark. You can construct a Hadoop cluster with HA as well as a pseudo Hadoop environment.
The roles contains only basic configurations. I recommend that you customize or parameterize roles to configure systems appropriately for your workload.
1.2. Feature¶
The project contains the following features.
- Examples of the inventry file of Ansible 
- Basic configurations provided via role variables and group_vars 
- Roles to configure and operate middleware 
- Playbooks to configure and operate middleware 
The main products which this project can deploy are:
- Bigtop based Apache Hadoop cluster * Pseudo environment * Distributed environment with NameNode and ResourceManager HA 
- Bigtop and community based Apache Spark 
1.3. Servers¶
This project’s assumption about middleware components and servers.
Servers for medium cluster
| Server | Use for | 
|---|---|
| master01 | Primary NameNode, JournalNode, ZooKeeper Server(id=1), Ganglia Slave | 
| master02 | JournalNode, ZooKeeper Server(id=2), Primary ResourceManager, Ganglia Slave | 
| master03 | JournalNode, ZooKeeper Server(id=3), HistoryServer, Standby ResourceManager, Standby NameNode, Ganglia Slave, Ganglia Master, InfluxDB, Grafana, Spark History Server | 
| client01 | Hadoop Client, Spark Client, Ganglia Slave, Zeppelin | 
| slave01 | DataNode, NodeManager, Ganglia Slave | 
| slave02 | DataNode, NodeManager, Ganglia Slave | 
| slave03 | DataNode, NodeManager, Ganglia Slave | 
| slave04 | DataNode, NodeManager, Ganglia Slave | 
| slave05 | DataNode, NodeManager, Ganglia Slave | 
| kafka01 | Kafka broker | 
| kafka02 | Kafka broker | 
| kafka03 | Kafka broker | 
| manage | Ambari server | 
Servers for large cluster
| Server | Use for | 
|---|---|
| master01 | Primary NameNode, Ganglia Slave | 
| master02 | Standby NameNode, Ganglia Slave | 
| master03 | Primary ResourceManager, Ganglia Slave | 
| master04 | Standby ResourceManager, Ganglia Slave | 
| master05 | JournalNode, ZooKeeper Server(id=1), Ganglia Slave | 
| master06 | JournalNode, ZooKeeper Server(id=2), Ganglia Slave | 
| master07 | JournalNode, ZooKeeper Server(id=3), Ganglia Slave | 
| master08 | HistoryServer, Ganglia Master, Ganglia Slave, InfluxDB, Grafana | 
| client01 | Hadoop Client, Spark Core, Ganglia Slave, Zeppelin | 
| slave01 | DataNode, NodeManager, Ganglia Slave | 
| slave02 | DataNode, NodeManager, Ganglia Slave | 
| slave03 | DataNode, NodeManager, Ganglia Slave | 
| slave04 | DataNode, NodeManager, Ganglia Slave | 
| slave05 | DataNode, NodeManager, Ganglia Slave | 
| slave06 | DataNode, NodeManager, Ganglia Slave | 
| slave07 | DataNode, NodeManager, Ganglia Slave | 
| slave08 | DataNode, NodeManager, Ganglia Slave | 
| slave09 | DataNode, NodeManager, Ganglia Slave | 
| slave10 | DataNode, NodeManager, Ganglia Slave | 
| kafka01 | Kafka broker | 
| kafka02 | Kafka broker | 
| kafka03 | Kafka broker | 
| manage | Ambari server | 
Server for pseudo environment
| Server | Use for | 
|---|---|
| pseudo | NameNode, DataNode, SecondaryNameNode, ResourceManager, NodeManager, Spark, Spark History Server | 
1.4. Software information¶
| Software | Version | 
|---|---|
| OS | (I use CentOS 7) | 
| Ansible | (I use 2.9.9) | 
| Hadoop | 2.8.5 (Bigtop 1.4.0) | 
| Spark | 2.2.3 (Bigtop 1.4.0) | 
| Spark Community version | 3.0.0 | 
1.5. Prerequirement¶
- Login to each server by SSH from the server where you execute ansible. 
- “sudo” as admin user in each server.