5. About playbooks¶
This project has two types of playbooks.
- Playbooks for configuration - These are used to install middlewares and configure parameters of OS and middlewares. 
 
- Playbooks for operation - These are used to operate OS’s services and middleware services. 
 
5.1. Playbooks for configuration¶
The playbooks in “playbooks/conf” directory provide functions to configure nodes.
In this section, the short descriptions for each playbook are shown.
5.1.1. common¶
This is a set of common and basic configurations including OS parameters.
- playbooks/conf/common/common_all.yml - The playbook which provides all basic configurations. 
 
- playbooks/conf/common/common_only_common.yml - The playbook which provides configurations only in “common” role 
 
5.1.2. cdh5¶
This is a set of configurations to construct CDH5 environment.
- cdh5_all.yml - This playbook is a comprehensive playbook which includes all other playbooks. You can build whole CDH5 environment. 
 
- cdh5_cl.yml - This playbook executes basic roles and “cdh5_cl” role to build Hadoop Client environment 
 
- cdh5_journalnode.yml - This playbook executes basic roles and “cdh5_jn” role to build HDFS JournalNode environment 
 
- cdh5_namenode.yml - This playbook executes basic roles and “cdh5_nn” role to build HDFS NameNode environment 
 
- cdh5_other.yml - This playbook executes basic roles and “cdh5_ot” role to build MapReduce HistoryServer and YARN Proxy environments 
 
- cdh5_resourcemanager.yml - This playbook executes basic roles and “cdh5_rm” role to build YARN ResourceManager environment 
 
- cdh5_slave.yml - This playbook executes basic roles and “cdh5_sl” role to build HDFS DataNode and YARN NodeManager environments 
 
- cdh5_spark.yml - This playbook executes basic roles and “cdh5_spark” role to build Spark Core environment on Client Node 
 
- cdh5_zookeeper.yml - This playbook executes basic roles and “zookeeper_server” role to build Zookeeper environment 
 
5.1.3. cdh5_pseudo¶
This is a set of configurations to construct CDH5 pseudo environment.
- cdh5_pseudo.yml - You can build whole CDH5 pseudo environment. 
 
- cdh5_spark.yml - You can build spark environment on CDH5 pseudo. 
 
5.1.4. ansible¶
This is a set of configuration about Ansible environment. If you have manually configured Ansible environment, such as ansible.cfg, inventory file and so on, you don’t need these playbooks.
- ansible_client.yml - This playbook executes “ansible” role to configure nodes where we execute ansible command 
 
- ansible_remote.yml - This playbook executes “ansible_remote” role to configure nodes which are configured by ansible 
 
5.1.5. ganglia¶
This is a set of configuration about Ganglia. We have two playbooks for Ganglia master and slave.
- ganglia_all.yml - The wrapper playbook of configuration of both of Ganglia master and slave 
 
- ganglia_master.yml - The playbook to configure Ganglia master 
 
- ganglia_slave.yml - The playbook to configure Ganglia slave 
 
5.1.6. influxdb¶
- all.yml - Configure influxdb and Grafana. 
 
5.1.7. spark_comm¶
- all.yml - Configure all nodes 
 
- spark_base.yml - Execute basic configuration of Spark 
 
- spark_client.yml - Configure client environment to develop Spark applications 
 
- spark_history.yml - Configure environment to run Spark history server 
 
- spark_libs.yml - Configure library environment to use native libraries in MLlib 
 
5.1.8. zeppelin¶
- zeppelin.yml - Configure zeppelin environment 
 
5.1.9. fluentd¶
- fluentd.yml - Configure fluentd 
 
- td_agent.yml - Configure td-agent 
 
5.1.10. kafka¶
- kafka_brocker.yml - Configure Kafka broker nodes. 
 
5.1.11. confluent¶
- kafka_broker.yml - Configure Confluent Kafka brokers 
 
- kafka_schema.yml - Configure Confluent Schema Registry 
 
- kafka_rest.yml - Configure Confluent REST Proxy 
 
5.1.12. ambari¶
- ambari_agent.yml - Configure Ambari agent manually (Not through Ambari server) 
 
- ambari_server.yml - Configure Ambari server 
 
5.1.13. jenkins¶
- jenkins.yml - Configure Jenkins server 
 
5.1.14. anacondace¶
- anacondace2.yml - Configure Anaconda2 CE 
 
- anacondace3.yml - Configure Anaconda3 CE 
 
5.1.15. postgresql¶
- postgresql.yml - Configure PostgreSQL 
 
5.1.16. cdh5_hive¶
- cdh5_hive.yml - Configure Hive and PostgreSQL 
 
5.1.17. alluxio_yarn¶
- alluxio_yarn.yml - Configure Alluxio on YARN - Configure client and slave nodes 
 
 
5.1.18. tpc_ds¶
- tpc_ds.yml - Configure TPC-DS packages 
 
5.1.19. tensorflow¶
- anaconda.yml - Configure Anaconda3 CE 
 
- gpu_env.yml - Configure CUDA and cuDNN environment 
 
- keras.yml - Configure Keras and TensorFlow environment (Use CPU) 
 
- keras_gpu.yml - Configure Keras and TensorFlow environment (Use GPU) 
 
5.2. Playbooks for operation¶
The playbooks in “playbooks/operation” directory provide functions to initialize and manage services.
In this section, the short descriptions for each playbook are shown.
5.2.1. cdh5¶
This is a set of operation of Hadoop services. Please check README in the cdh5 directory for more information.
5.2.2. ec2¶
This is a set of operation to boot EC2 instances. Please check README in the ec2 directory for more information.
5.2.3. influxdb¶
- create_db.yml - Create all databases in InfluxDB. 
 
- create_graphite_db.yml - Create database in InfluxDB, which hold data gathered by Graphite’s protocol. This is mainly used by Spark. 
 
- create_grafana_db.yml - Create database in InfluxDB, which hold Grafana’s dashboard data. 
 
5.2.4. spark_comm¶
- make_spark_packages.yml - Compile Spark sources and build packages 
 
- start_spark_historyserver.yml - Start Spark’s history server 
 
- stop_spark_historyserver.yml - Stop Spark’s history server 
 
5.2.5. zeppelin¶
- build.yml - Compile and package Zeppelin 
- This is helper playbook to build Zeppelin. You can build Zeppelin according to Zeppelin official web site. 
 
- restart_zeppelin.yml - Stop and start Zeppelin serives 
 
- start_zeppelin.yml - Start zeppelin services by executing zeppelin-daemon.sh 
 
- stop_zeppelin.yml - Stop zeppelin services by executing zeppelin-daemon.sh 
 
5.2.6. fluentd¶
- restart_td_agent.yml - Stop and Start td-agent 
 
- start_td_agent.yml - Start td-agent 
 
- stop_td_agent.yml - Stop td-agent 
 
5.2.7. kafka¶
- restart_kafka.yml - Stop and Start kafka 
 
- start_kafka.yml - Start kafka 
 
- stop_kafka.yml - Stop kafka 
 
- create_topic.yml - Create topic on Kafka cluster 
 
- delete_topic.yml - Delete topic on Kafka cluster 
 
5.2.8. confluent¶
- restart_kafka_rest.yml - Stop and Start REST Proxy service 
 
- restart_kafka_server.yml - Stop and Start Kafka broker service 
 
- restart_zookeeper_server.yml - Stop and Start ZooKeeper serivce 
- If you configured ZooKeeper service on Kafka broker nodes, you can use this playbook to control such ZooKeeper serivces. 
 
- start_kafka_rest.yml - Start Kafka REST Proxy serivce 
 
- start_kafka_server.yml - Start Kafka broker service 
 
- start_schema_registry.yml - Start Confluent schema registry service 
 
- start_zookeeper_server.yml - Start ZooKeeper serivce 
- If you configured ZooKeeper service on Kafka broker nodes, you can use this playbook to control such ZooKeeper serivces. 
 
- stop_kafka_rest.yml - Stop Kafka REST Proxy serivce 
 
- stop_kafka_server.yml - Stop Kafka broker serivce 
 
- stop_schema_registry.yml - Stop Confluent schema registry service 
 
- stop_zookeeper_server.yml - Stop ZooKeeper serivce 
- If you configured ZooKeeper service on Kafka broker nodes, you can use this playbook to control such ZooKeeper serivces. 
 
5.2.9. ambari¶
- To setup Ambari server - setup.yml 
 
- Starting and stopping each service - restart_all.yml 
- restart_ambari_metrics.yml 
- restart_hdfs.yml 
- restart_yarn.yml 
- restart_zookeeper.yml 
- start_all.yml 
- start_ambari_metrics.yml 
- start_hdfs.yml 
- start_yarn.yml 
- start_zookeeper.yml 
- stop_all.yml 
- stop_ambari_metrics.yml 
- stop_hdfs.yml 
- stop_yarn.yml 
- stop_zookeeper.yml 
 
5.2.10. postgresql¶
- setup db - initdb.yml 
 
- start and stop postgresql - start_postgresql.yml 
- stop_postgresql.yml 
- restart_postgresql.yml 
 
5.2.11. cdh5_hive¶
- setup - create_metastore_db.yml 
 
- start and stop services - start_metastore.yml 
- stop_metastore.yml 
 
5.2.12. deploy_yarn¶
- deploy Alluxio application to YARN - deploy_alluxio.yml