1. Abstract¶
1.1. About playbooks¶
This is a library of playbooks to construct HDFS/YARN clusters with some kinds of Big Data tools, such as Apache Spark. You can construct a Hadoop cluster with HA as well as a pseudo Hadoop environment.
The roles contains only basic configurations. I recommend that you customize or parameterize roles to configure systems appropriately for your workload.
1.2. Feature¶
The project contains the following features.
Examples of the inventry file of Ansible
Basic configurations provided via role variables and group_vars
Roles to configure and operate middleware
Playbooks to configure and operate middleware
The main products which this project can deploy are:
Bigtop based Apache Hadoop cluster * Pseudo environment * Distributed environment with NameNode and ResourceManager HA
Bigtop and community based Apache Spark
1.3. Servers¶
This project’s assumption about middleware components and servers.
Servers for medium cluster
Server |
Use for |
---|---|
master01 |
Primary NameNode, JournalNode, ZooKeeper Server(id=1), Ganglia Slave |
master02 |
JournalNode, ZooKeeper Server(id=2), Primary ResourceManager, Ganglia Slave |
master03 |
JournalNode, ZooKeeper Server(id=3), HistoryServer, Standby ResourceManager, Standby NameNode, Ganglia Slave, Ganglia Master, InfluxDB, Grafana, Spark History Server |
client01 |
Hadoop Client, Spark Client, Ganglia Slave, Zeppelin |
slave01 |
DataNode, NodeManager, Ganglia Slave |
slave02 |
DataNode, NodeManager, Ganglia Slave |
slave03 |
DataNode, NodeManager, Ganglia Slave |
slave04 |
DataNode, NodeManager, Ganglia Slave |
slave05 |
DataNode, NodeManager, Ganglia Slave |
kafka01 |
Kafka broker |
kafka02 |
Kafka broker |
kafka03 |
Kafka broker |
manage |
Ambari server |
Servers for large cluster
Server |
Use for |
---|---|
master01 |
Primary NameNode, Ganglia Slave |
master02 |
Standby NameNode, Ganglia Slave |
master03 |
Primary ResourceManager, Ganglia Slave |
master04 |
Standby ResourceManager, Ganglia Slave |
master05 |
JournalNode, ZooKeeper Server(id=1), Ganglia Slave |
master06 |
JournalNode, ZooKeeper Server(id=2), Ganglia Slave |
master07 |
JournalNode, ZooKeeper Server(id=3), Ganglia Slave |
master08 |
HistoryServer, Ganglia Master, Ganglia Slave, InfluxDB, Grafana |
client01 |
Hadoop Client, Spark Core, Ganglia Slave, Zeppelin |
slave01 |
DataNode, NodeManager, Ganglia Slave |
slave02 |
DataNode, NodeManager, Ganglia Slave |
slave03 |
DataNode, NodeManager, Ganglia Slave |
slave04 |
DataNode, NodeManager, Ganglia Slave |
slave05 |
DataNode, NodeManager, Ganglia Slave |
slave06 |
DataNode, NodeManager, Ganglia Slave |
slave07 |
DataNode, NodeManager, Ganglia Slave |
slave08 |
DataNode, NodeManager, Ganglia Slave |
slave09 |
DataNode, NodeManager, Ganglia Slave |
slave10 |
DataNode, NodeManager, Ganglia Slave |
kafka01 |
Kafka broker |
kafka02 |
Kafka broker |
kafka03 |
Kafka broker |
manage |
Ambari server |
Server for pseudo environment
Server |
Use for |
---|---|
pseudo |
NameNode, DataNode, SecondaryNameNode, ResourceManager, NodeManager, Spark, Spark History Server |
1.4. Software information¶
Software |
Version |
---|---|
OS |
(I use CentOS 7) |
Ansible |
(I use 2.9.9) |
Hadoop |
2.8.5 (Bigtop 1.4.0) |
Spark |
2.2.3 (Bigtop 1.4.0) |
Spark Community version |
3.0.0 |
1.5. Prerequirement¶
Login to each server by SSH from the server where you execute ansible.
“sudo” as admin user in each server.