5. About playbooks

This project has two types of playbooks.

5.1. Playbooks for configuration

The playbooks in “playbooks/conf” directory provide functions to configure nodes.

In this section, the short descriptions for each playbook are shown.

5.1.1. common

This is a set of common and basic configurations including OS parameters.

  • playbooks/conf/common/common_all.yml

    • The playbook which provides all basic configurations.

  • playbooks/conf/common/common_only_common.yml

    • The playbook which provides configurations only in “common” role

5.1.2. cdh5

This is a set of configurations to construct CDH5 environment.

  • cdh5_all.yml

    • This playbook is a comprehensive playbook which includes all other playbooks. You can build whole CDH5 environment.

  • cdh5_cl.yml

    • This playbook executes basic roles and “cdh5_cl” role to build Hadoop Client environment

  • cdh5_journalnode.yml

    • This playbook executes basic roles and “cdh5_jn” role to build HDFS JournalNode environment

  • cdh5_namenode.yml

    • This playbook executes basic roles and “cdh5_nn” role to build HDFS NameNode environment

  • cdh5_other.yml

    • This playbook executes basic roles and “cdh5_ot” role to build MapReduce HistoryServer and YARN Proxy environments

  • cdh5_resourcemanager.yml

    • This playbook executes basic roles and “cdh5_rm” role to build YARN ResourceManager environment

  • cdh5_slave.yml

    • This playbook executes basic roles and “cdh5_sl” role to build HDFS DataNode and YARN NodeManager environments

  • cdh5_spark.yml

    • This playbook executes basic roles and “cdh5_spark” role to build Spark Core environment on Client Node

  • cdh5_zookeeper.yml

    • This playbook executes basic roles and “zookeeper_server” role to build Zookeeper environment

5.1.3. cdh5_pseudo

This is a set of configurations to construct CDH5 pseudo environment.

  • cdh5_pseudo.yml

    • You can build whole CDH5 pseudo environment.

  • cdh5_spark.yml

    • You can build spark environment on CDH5 pseudo.

5.1.4. ansible

This is a set of configuration about Ansible environment. If you have manually configured Ansible environment, such as ansible.cfg, inventory file and so on, you don’t need these playbooks.

  • ansible_client.yml

    • This playbook executes “ansible” role to configure nodes where we execute ansible command

  • ansible_remote.yml

    • This playbook executes “ansible_remote” role to configure nodes which are configured by ansible

5.1.5. ganglia

This is a set of configuration about Ganglia. We have two playbooks for Ganglia master and slave.

  • ganglia_all.yml

    • The wrapper playbook of configuration of both of Ganglia master and slave

  • ganglia_master.yml

    • The playbook to configure Ganglia master

  • ganglia_slave.yml

    • The playbook to configure Ganglia slave

5.1.6. influxdb

  • all.yml

    • Configure influxdb and Grafana.

5.1.7. spark_comm

  • all.yml

    • Configure all nodes

  • spark_base.yml

    • Execute basic configuration of Spark

  • spark_client.yml

    • Configure client environment to develop Spark applications

  • spark_history.yml

    • Configure environment to run Spark history server

  • spark_libs.yml

    • Configure library environment to use native libraries in MLlib

5.1.8. zeppelin

  • zeppelin.yml

    • Configure zeppelin environment

5.1.9. fluentd

  • fluentd.yml

    • Configure fluentd

  • td_agent.yml

    • Configure td-agent

5.1.10. kafka

  • kafka_brocker.yml

    • Configure Kafka broker nodes.

5.1.11. confluent

  • kafka_broker.yml

    • Configure Confluent Kafka brokers

  • kafka_schema.yml

    • Configure Confluent Schema Registry

  • kafka_rest.yml

    • Configure Confluent REST Proxy

5.1.12. ambari

  • ambari_agent.yml

    • Configure Ambari agent manually (Not through Ambari server)

  • ambari_server.yml

    • Configure Ambari server

5.1.13. jenkins

  • jenkins.yml

    • Configure Jenkins server

5.1.14. anacondace

  • anacondace2.yml

    • Configure Anaconda2 CE

  • anacondace3.yml

    • Configure Anaconda3 CE

5.1.15. postgresql

  • postgresql.yml

    • Configure PostgreSQL

5.1.16. cdh5_hive

  • cdh5_hive.yml

    • Configure Hive and PostgreSQL

5.1.17. alluxio_yarn

  • alluxio_yarn.yml

    • Configure Alluxio on YARN

      • Configure client and slave nodes

5.1.18. tpc_ds

  • tpc_ds.yml

    • Configure TPC-DS packages

5.1.19. tensorflow

  • anaconda.yml

    • Configure Anaconda3 CE

  • gpu_env.yml

    • Configure CUDA and cuDNN environment

  • keras.yml

    • Configure Keras and TensorFlow environment (Use CPU)

  • keras_gpu.yml

    • Configure Keras and TensorFlow environment (Use GPU)

5.2. Playbooks for operation

The playbooks in “playbooks/operation” directory provide functions to initialize and manage services.

In this section, the short descriptions for each playbook are shown.

5.2.1. cdh5

This is a set of operation of Hadoop services. Please check README in the cdh5 directory for more information.

5.2.2. ec2

This is a set of operation to boot EC2 instances. Please check README in the ec2 directory for more information.

5.2.3. influxdb

  • create_db.yml

    • Create all databases in InfluxDB.

  • create_graphite_db.yml

    • Create database in InfluxDB, which hold data gathered by Graphite’s protocol. This is mainly used by Spark.

  • create_grafana_db.yml

    • Create database in InfluxDB, which hold Grafana’s dashboard data.

5.2.4. spark_comm

  • make_spark_packages.yml

    • Compile Spark sources and build packages

  • start_spark_historyserver.yml

    • Start Spark’s history server

  • stop_spark_historyserver.yml

    • Stop Spark’s history server

5.2.5. zeppelin

  • build.yml

    • Compile and package Zeppelin

    • This is helper playbook to build Zeppelin. You can build Zeppelin according to Zeppelin official web site.

  • restart_zeppelin.yml

    • Stop and start Zeppelin serives

  • start_zeppelin.yml

    • Start zeppelin services by executing zeppelin-daemon.sh

  • stop_zeppelin.yml

    • Stop zeppelin services by executing zeppelin-daemon.sh

5.2.6. fluentd

  • restart_td_agent.yml

    • Stop and Start td-agent

  • start_td_agent.yml

    • Start td-agent

  • stop_td_agent.yml

    • Stop td-agent

5.2.7. kafka

  • restart_kafka.yml

    • Stop and Start kafka

  • start_kafka.yml

    • Start kafka

  • stop_kafka.yml

    • Stop kafka

  • create_topic.yml

    • Create topic on Kafka cluster

  • delete_topic.yml

    • Delete topic on Kafka cluster

5.2.8. confluent

  • restart_kafka_rest.yml

    • Stop and Start REST Proxy service

  • restart_kafka_server.yml

    • Stop and Start Kafka broker service

  • restart_zookeeper_server.yml

    • Stop and Start ZooKeeper serivce

    • If you configured ZooKeeper service on Kafka broker nodes, you can use this playbook to control such ZooKeeper serivces.

  • start_kafka_rest.yml

    • Start Kafka REST Proxy serivce

  • start_kafka_server.yml

    • Start Kafka broker service

  • start_schema_registry.yml

    • Start Confluent schema registry service

  • start_zookeeper_server.yml

    • Start ZooKeeper serivce

    • If you configured ZooKeeper service on Kafka broker nodes, you can use this playbook to control such ZooKeeper serivces.

  • stop_kafka_rest.yml

    • Stop Kafka REST Proxy serivce

  • stop_kafka_server.yml

    • Stop Kafka broker serivce

  • stop_schema_registry.yml

    • Stop Confluent schema registry service

  • stop_zookeeper_server.yml

    • Stop ZooKeeper serivce

    • If you configured ZooKeeper service on Kafka broker nodes, you can use this playbook to control such ZooKeeper serivces.

5.2.9. ambari

  • To setup Ambari server

    • setup.yml

  • Starting and stopping each service

    • restart_all.yml

    • restart_ambari_metrics.yml

    • restart_hdfs.yml

    • restart_yarn.yml

    • restart_zookeeper.yml

    • start_all.yml

    • start_ambari_metrics.yml

    • start_hdfs.yml

    • start_yarn.yml

    • start_zookeeper.yml

    • stop_all.yml

    • stop_ambari_metrics.yml

    • stop_hdfs.yml

    • stop_yarn.yml

    • stop_zookeeper.yml

5.2.10. postgresql

  • setup db

    • initdb.yml

  • start and stop postgresql

    • start_postgresql.yml

    • stop_postgresql.yml

    • restart_postgresql.yml

5.2.11. cdh5_hive

  • setup

    • create_metastore_db.yml

  • start and stop services

    • start_metastore.yml

    • stop_metastore.yml

5.2.12. deploy_yarn

  • deploy Alluxio application to YARN

    • deploy_alluxio.yml