5. About playbooks¶
This project has two types of playbooks.
Playbooks for configuration
These are used to install middlewares and configure parameters of OS and middlewares.
Playbooks for operation
These are used to operate OS’s services and middleware services.
5.1. Playbooks for configuration¶
The playbooks in “playbooks/conf” directory provide functions to configure nodes.
In this section, the short descriptions for each playbook are shown.
5.1.1. common¶
This is a set of common and basic configurations including OS parameters.
playbooks/conf/common/common_all.yml
The playbook which provides all basic configurations.
playbooks/conf/common/common_only_common.yml
The playbook which provides configurations only in “common” role
5.1.2. cdh5¶
This is a set of configurations to construct CDH5 environment.
cdh5_all.yml
This playbook is a comprehensive playbook which includes all other playbooks. You can build whole CDH5 environment.
cdh5_cl.yml
This playbook executes basic roles and “cdh5_cl” role to build Hadoop Client environment
cdh5_journalnode.yml
This playbook executes basic roles and “cdh5_jn” role to build HDFS JournalNode environment
cdh5_namenode.yml
This playbook executes basic roles and “cdh5_nn” role to build HDFS NameNode environment
cdh5_other.yml
This playbook executes basic roles and “cdh5_ot” role to build MapReduce HistoryServer and YARN Proxy environments
cdh5_resourcemanager.yml
This playbook executes basic roles and “cdh5_rm” role to build YARN ResourceManager environment
cdh5_slave.yml
This playbook executes basic roles and “cdh5_sl” role to build HDFS DataNode and YARN NodeManager environments
cdh5_spark.yml
This playbook executes basic roles and “cdh5_spark” role to build Spark Core environment on Client Node
cdh5_zookeeper.yml
This playbook executes basic roles and “zookeeper_server” role to build Zookeeper environment
5.1.3. cdh5_pseudo¶
This is a set of configurations to construct CDH5 pseudo environment.
cdh5_pseudo.yml
You can build whole CDH5 pseudo environment.
cdh5_spark.yml
You can build spark environment on CDH5 pseudo.
5.1.4. ansible¶
This is a set of configuration about Ansible environment. If you have manually configured Ansible environment, such as ansible.cfg, inventory file and so on, you don’t need these playbooks.
ansible_client.yml
This playbook executes “ansible” role to configure nodes where we execute ansible command
ansible_remote.yml
This playbook executes “ansible_remote” role to configure nodes which are configured by ansible
5.1.5. ganglia¶
This is a set of configuration about Ganglia. We have two playbooks for Ganglia master and slave.
ganglia_all.yml
The wrapper playbook of configuration of both of Ganglia master and slave
ganglia_master.yml
The playbook to configure Ganglia master
ganglia_slave.yml
The playbook to configure Ganglia slave
5.1.6. influxdb¶
all.yml
Configure influxdb and Grafana.
5.1.7. spark_comm¶
all.yml
Configure all nodes
spark_base.yml
Execute basic configuration of Spark
spark_client.yml
Configure client environment to develop Spark applications
spark_history.yml
Configure environment to run Spark history server
spark_libs.yml
Configure library environment to use native libraries in MLlib
5.1.8. zeppelin¶
zeppelin.yml
Configure zeppelin environment
5.1.9. fluentd¶
fluentd.yml
Configure fluentd
td_agent.yml
Configure td-agent
5.1.10. kafka¶
kafka_brocker.yml
Configure Kafka broker nodes.
5.1.11. confluent¶
kafka_broker.yml
Configure Confluent Kafka brokers
kafka_schema.yml
Configure Confluent Schema Registry
kafka_rest.yml
Configure Confluent REST Proxy
5.1.12. ambari¶
ambari_agent.yml
Configure Ambari agent manually (Not through Ambari server)
ambari_server.yml
Configure Ambari server
5.1.13. jenkins¶
jenkins.yml
Configure Jenkins server
5.1.14. anacondace¶
anacondace2.yml
Configure Anaconda2 CE
anacondace3.yml
Configure Anaconda3 CE
5.1.15. postgresql¶
postgresql.yml
Configure PostgreSQL
5.1.16. cdh5_hive¶
cdh5_hive.yml
Configure Hive and PostgreSQL
5.1.17. alluxio_yarn¶
alluxio_yarn.yml
Configure Alluxio on YARN
Configure client and slave nodes
5.1.18. tpc_ds¶
tpc_ds.yml
Configure TPC-DS packages
5.1.19. tensorflow¶
anaconda.yml
Configure Anaconda3 CE
gpu_env.yml
Configure CUDA and cuDNN environment
keras.yml
Configure Keras and TensorFlow environment (Use CPU)
keras_gpu.yml
Configure Keras and TensorFlow environment (Use GPU)
5.2. Playbooks for operation¶
The playbooks in “playbooks/operation” directory provide functions to initialize and manage services.
In this section, the short descriptions for each playbook are shown.
5.2.1. cdh5¶
This is a set of operation of Hadoop services. Please check README in the cdh5 directory for more information.
5.2.2. ec2¶
This is a set of operation to boot EC2 instances. Please check README in the ec2 directory for more information.
5.2.3. influxdb¶
create_db.yml
Create all databases in InfluxDB.
create_graphite_db.yml
Create database in InfluxDB, which hold data gathered by Graphite’s protocol. This is mainly used by Spark.
create_grafana_db.yml
Create database in InfluxDB, which hold Grafana’s dashboard data.
5.2.4. spark_comm¶
make_spark_packages.yml
Compile Spark sources and build packages
start_spark_historyserver.yml
Start Spark’s history server
stop_spark_historyserver.yml
Stop Spark’s history server
5.2.5. zeppelin¶
build.yml
Compile and package Zeppelin
This is helper playbook to build Zeppelin. You can build Zeppelin according to Zeppelin official web site.
restart_zeppelin.yml
Stop and start Zeppelin serives
start_zeppelin.yml
Start zeppelin services by executing zeppelin-daemon.sh
stop_zeppelin.yml
Stop zeppelin services by executing zeppelin-daemon.sh
5.2.6. fluentd¶
restart_td_agent.yml
Stop and Start td-agent
start_td_agent.yml
Start td-agent
stop_td_agent.yml
Stop td-agent
5.2.7. kafka¶
restart_kafka.yml
Stop and Start kafka
start_kafka.yml
Start kafka
stop_kafka.yml
Stop kafka
create_topic.yml
Create topic on Kafka cluster
delete_topic.yml
Delete topic on Kafka cluster
5.2.8. confluent¶
restart_kafka_rest.yml
Stop and Start REST Proxy service
restart_kafka_server.yml
Stop and Start Kafka broker service
restart_zookeeper_server.yml
Stop and Start ZooKeeper serivce
If you configured ZooKeeper service on Kafka broker nodes, you can use this playbook to control such ZooKeeper serivces.
start_kafka_rest.yml
Start Kafka REST Proxy serivce
start_kafka_server.yml
Start Kafka broker service
start_schema_registry.yml
Start Confluent schema registry service
start_zookeeper_server.yml
Start ZooKeeper serivce
If you configured ZooKeeper service on Kafka broker nodes, you can use this playbook to control such ZooKeeper serivces.
stop_kafka_rest.yml
Stop Kafka REST Proxy serivce
stop_kafka_server.yml
Stop Kafka broker serivce
stop_schema_registry.yml
Stop Confluent schema registry service
stop_zookeeper_server.yml
Stop ZooKeeper serivce
If you configured ZooKeeper service on Kafka broker nodes, you can use this playbook to control such ZooKeeper serivces.
5.2.9. ambari¶
To setup Ambari server
setup.yml
Starting and stopping each service
restart_all.yml
restart_ambari_metrics.yml
restart_hdfs.yml
restart_yarn.yml
restart_zookeeper.yml
start_all.yml
start_ambari_metrics.yml
start_hdfs.yml
start_yarn.yml
start_zookeeper.yml
stop_all.yml
stop_ambari_metrics.yml
stop_hdfs.yml
stop_yarn.yml
stop_zookeeper.yml
5.2.10. postgresql¶
setup db
initdb.yml
start and stop postgresql
start_postgresql.yml
stop_postgresql.yml
restart_postgresql.yml
5.2.11. cdh5_hive¶
setup
create_metastore_db.yml
start and stop services
start_metastore.yml
stop_metastore.yml
5.2.12. deploy_yarn¶
deploy Alluxio application to YARN
deploy_alluxio.yml