5. About playbooks¶

This project has two types of playbooks.

Playbooks for configuration
- These are used to install middlewares and configure parameters of OS and middlewares.
Playbooks for operation
- These are used to operate OS’s services and middleware services.

5.1. Playbooks for configuration¶

The playbooks in “playbooks/conf” directory provide functions to configure nodes.

In this section, the short descriptions for each playbook are shown.

5.1.1. common¶

This is a set of common and basic configurations including OS parameters.

playbooks/conf/common/common_all.yml
- The playbook which provides all basic configurations.
playbooks/conf/common/common_only_common.yml
- The playbook which provides configurations only in “common” role

5.1.2. cdh5¶

This is a set of configurations to construct CDH5 environment.

cdh5_all.yml
- This playbook is a comprehensive playbook which includes all other playbooks. You can build whole CDH5 environment.
cdh5_cl.yml
- This playbook executes basic roles and “cdh5_cl” role to build Hadoop Client environment
cdh5_journalnode.yml
- This playbook executes basic roles and “cdh5_jn” role to build HDFS JournalNode environment
cdh5_namenode.yml
- This playbook executes basic roles and “cdh5_nn” role to build HDFS NameNode environment
cdh5_other.yml
- This playbook executes basic roles and “cdh5_ot” role to build MapReduce HistoryServer and YARN Proxy environments
cdh5_resourcemanager.yml
- This playbook executes basic roles and “cdh5_rm” role to build YARN ResourceManager environment
cdh5_slave.yml
- This playbook executes basic roles and “cdh5_sl” role to build HDFS DataNode and YARN NodeManager environments
cdh5_spark.yml
- This playbook executes basic roles and “cdh5_spark” role to build Spark Core environment on Client Node
cdh5_zookeeper.yml
- This playbook executes basic roles and “zookeeper_server” role to build Zookeeper environment

5.1.3. cdh5_pseudo¶

This is a set of configurations to construct CDH5 pseudo environment.

cdh5_pseudo.yml
- You can build whole CDH5 pseudo environment.
cdh5_spark.yml
- You can build spark environment on CDH5 pseudo.

5.1.4. ansible¶

This is a set of configuration about Ansible environment. If you have manually configured Ansible environment, such as ansible.cfg, inventory file and so on, you don’t need these playbooks.

ansible_client.yml
- This playbook executes “ansible” role to configure nodes where we execute ansible command
ansible_remote.yml
- This playbook executes “ansible_remote” role to configure nodes which are configured by ansible

5.1.5. ganglia¶

This is a set of configuration about Ganglia. We have two playbooks for Ganglia master and slave.

ganglia_all.yml
- The wrapper playbook of configuration of both of Ganglia master and slave
ganglia_master.yml
- The playbook to configure Ganglia master
ganglia_slave.yml
- The playbook to configure Ganglia slave

5.1.6. influxdb¶

all.yml
- Configure influxdb and Grafana.

5.1.7. spark_comm¶

all.yml
- Configure all nodes
spark_base.yml
- Execute basic configuration of Spark
spark_client.yml
- Configure client environment to develop Spark applications
spark_history.yml
- Configure environment to run Spark history server
spark_libs.yml
- Configure library environment to use native libraries in MLlib

5.1.8. zeppelin¶

zeppelin.yml
- Configure zeppelin environment

5.1.9. fluentd¶

fluentd.yml
- Configure fluentd
td_agent.yml
- Configure td-agent

5.1.10. kafka¶

kafka_brocker.yml
- Configure Kafka broker nodes.

5.1.11. confluent¶

kafka_broker.yml
- Configure Confluent Kafka brokers
kafka_schema.yml
- Configure Confluent Schema Registry
kafka_rest.yml
- Configure Confluent REST Proxy

5.1.12. ambari¶

ambari_agent.yml
- Configure Ambari agent manually (Not through Ambari server)
ambari_server.yml
- Configure Ambari server

5.1.13. jenkins¶

jenkins.yml
- Configure Jenkins server

5.1.14. anacondace¶

anacondace2.yml
- Configure Anaconda2 CE
anacondace3.yml
- Configure Anaconda3 CE

5.1.15. postgresql¶

postgresql.yml
- Configure PostgreSQL

5.1.16. cdh5_hive¶

cdh5_hive.yml
- Configure Hive and PostgreSQL

5.1.17. alluxio_yarn¶

alluxio_yarn.yml
- Configure Alluxio on YARN
  - Configure client and slave nodes

5.1.18. tpc_ds¶

tpc_ds.yml
- Configure TPC-DS packages

5.1.19. tensorflow¶

anaconda.yml
- Configure Anaconda3 CE
gpu_env.yml
- Configure CUDA and cuDNN environment
keras.yml
- Configure Keras and TensorFlow environment (Use CPU)
keras_gpu.yml
- Configure Keras and TensorFlow environment (Use GPU)

5.2. Playbooks for operation¶

The playbooks in “playbooks/operation” directory provide functions to initialize and manage services.

In this section, the short descriptions for each playbook are shown.

5.2.1. cdh5¶

This is a set of operation of Hadoop services. Please check README in the cdh5 directory for more information.

5.2.2. ec2¶

This is a set of operation to boot EC2 instances. Please check README in the ec2 directory for more information.

5.2.3. influxdb¶

create_db.yml
- Create all databases in InfluxDB.
create_graphite_db.yml
- Create database in InfluxDB, which hold data gathered by Graphite’s protocol. This is mainly used by Spark.
create_grafana_db.yml
- Create database in InfluxDB, which hold Grafana’s dashboard data.

5.2.4. spark_comm¶

make_spark_packages.yml
- Compile Spark sources and build packages
start_spark_historyserver.yml
- Start Spark’s history server
stop_spark_historyserver.yml
- Stop Spark’s history server

5.2.5. zeppelin¶

build.yml
- Compile and package Zeppelin
- This is helper playbook to build Zeppelin. You can build Zeppelin according to Zeppelin official web site.
restart_zeppelin.yml
- Stop and start Zeppelin serives
start_zeppelin.yml
- Start zeppelin services by executing zeppelin-daemon.sh
stop_zeppelin.yml
- Stop zeppelin services by executing zeppelin-daemon.sh

5.2.6. fluentd¶

restart_td_agent.yml
- Stop and Start td-agent
start_td_agent.yml
- Start td-agent
stop_td_agent.yml
- Stop td-agent

5.2.7. kafka¶

restart_kafka.yml
- Stop and Start kafka
start_kafka.yml
- Start kafka
stop_kafka.yml
- Stop kafka
create_topic.yml
- Create topic on Kafka cluster
delete_topic.yml
- Delete topic on Kafka cluster

5.2.8. confluent¶

restart_kafka_rest.yml
- Stop and Start REST Proxy service
restart_kafka_server.yml
- Stop and Start Kafka broker service
restart_zookeeper_server.yml
- Stop and Start ZooKeeper serivce
- If you configured ZooKeeper service on Kafka broker nodes, you can use this playbook to control such ZooKeeper serivces.
start_kafka_rest.yml
- Start Kafka REST Proxy serivce
start_kafka_server.yml
- Start Kafka broker service
start_schema_registry.yml
- Start Confluent schema registry service
start_zookeeper_server.yml
- Start ZooKeeper serivce
- If you configured ZooKeeper service on Kafka broker nodes, you can use this playbook to control such ZooKeeper serivces.
stop_kafka_rest.yml
- Stop Kafka REST Proxy serivce
stop_kafka_server.yml
- Stop Kafka broker serivce
stop_schema_registry.yml
- Stop Confluent schema registry service
stop_zookeeper_server.yml
- Stop ZooKeeper serivce
- If you configured ZooKeeper service on Kafka broker nodes, you can use this playbook to control such ZooKeeper serivces.

5.2.9. ambari¶

To setup Ambari server
- setup.yml
Starting and stopping each service
- restart_all.yml
- restart_ambari_metrics.yml
- restart_hdfs.yml
- restart_yarn.yml
- restart_zookeeper.yml
- start_all.yml
- start_ambari_metrics.yml
- start_hdfs.yml
- start_yarn.yml
- start_zookeeper.yml
- stop_all.yml
- stop_ambari_metrics.yml
- stop_hdfs.yml
- stop_yarn.yml
- stop_zookeeper.yml

5.2.10. postgresql¶

setup db
- initdb.yml
start and stop postgresql
- start_postgresql.yml
- stop_postgresql.yml
- restart_postgresql.yml

5.2.11. cdh5_hive¶

setup
- create_metastore_db.yml
start and stop services
- start_metastore.yml
- stop_metastore.yml

5.2.12. deploy_yarn¶

deploy Alluxio application to YARN
- deploy_alluxio.yml