Revisiting Ansible

I used ansible at work few months back. Today, I faced a problem at work, trying to connect to a virtual machine on Google Cloud Platform created with ansible. I didn’t feel surprised, when I was not able to be productive on it immediately, because I didn’t learn ansible as much as I learnt terraform. I learnt both of them using the docs. If you ask me to do something dealing with multiple machines, I would probably prefer terraform over ansible. Reason for this would be

  • I prefer HCL (HashiCorp Configuration Language) over YAML.
  • The documentation of terraform and all the projects by HashiCorp are pretty good at convincing me to do something in their way.

But, ansible is a great tool (based on my first encounter with it and the amount of stuff they are doing) and I don’t want to be missed out in the party. I really would like to figure out when to use ansible and when not to. The documentation probably has lot of things and is kind of scary for me. I had gone over a lot of things in there already few months back and I remember almost nothing. So, this time I am taking a different route - I just signed up for Linkedin Learning (Lynda) and started watching Learning ansible course. Proabably this could help me remember core concepts of ansible.

Introduction

  • Ansible is a task execution engine.
  • It helps computer people run some tasks on remote computers or on the machine on which ansible is running.
  • Configured using YAML - which could be source controlled and reused.
  • Open Source Project: https://github.com/ansible/ansible and written in python.

Installation

Python is a prerequisite and since I am trying this out on macOS, I am supposed to have pip installed for installing ansible.

1
2
3
4
5
6
# install pip
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py --user

# install ansible
pip install ansible --user

No Agents required

This probably seems to be the super cool thing about Ansible. “There is no need for running agents on remote machines to manage them.”

So if there is no agent installed on the machine, how does the machine receive the tasks to be executed?
It seems like ansible’s default communication is via SSH.

ansible overview

No State

Ansible does not result in a state file like terraform. This means the same ansible script could be run from any machine and it will understand the state of things in the runtime.

Let me do a small experiment for this. I will try to bring up a GCP machine with ansible by using one fresh machine as control node and use the same YAML in another new control node. I did this experiment by running a runSh on Shippable with two different nodes.

wow! Ansible understands the state correctly and results in creating the node only once eventhough I ran it twice on the provisioning scripts twice from two different control nodes.

How does ansible perform this magic?
I discovered this at a later point of time. It is by using “dynamic inventories”. (will discuss about it down below)

What it needs?

  • Inventory (I guess this is a collection of information about the hosts)
  • State Directives (I have no clue what this is)
  • Credentials (Secret stuff used to talk to cloud providers and machines)

Inventory

Inventory is collection of hosts (IP Addresses/Host names). Sometimes they are grouped together into named groups. (example: webservers, dbservers etc.)

1
2
3
4
5
6
7
8
9
10
mail.example.com

[webservers]
foo.example.com
bar.example.com

[dbservers]
one.example.com
two.example.com
three.example.com

Multiple inventories are supported. I am probably thinking we can define as much inventory files as we like and choose one when we run ansible.

Inventory could also variables. These variables could be used while executing a task in ansible and for controlling the behaviour of ansible (Eg: ansible_user=test). Variables could be grouped inside the inventory/group_vars folder.

1
2
3
4
5
6
7
8
9
10
[webservers:vars]
type="web"
port="80"

[dbservers:vars]
type="db"
port="5432"

[all:vars]
global="available for all hosts"

It seems like there are two types of inventory sources.

  • Static source, where information is maintained in files.
  • Dynamic source, where information is fetched from cloud providers like AWS, Google Cloud Platform etc.

This is exciting. I didn’t know about dynamic inventory sources before. I am going to spend sometime reading about it here.

So, the reason why my GCE instance did not get provisioned is earlier is due to this reason. I was using a dynamic inventory which contacted the google cloud platform for the existing infrastructure and updated in my machine.

I learnt how to use a dynamic inventories by looking at a sample that we wrote at work.

Various strategies for executing on multiple machines are present like Linear (Default), Serial, Free. Do check them out as they are interesting and could give you control over how ansible runs commands on your target machines.

Tasks

  • Task: written in YAML and defines work to be done on the target machine.
  • Task Data: the data sent to target host for execution (Eg: Database name could be task data)
  • Task Control: used to control the flow of tasks like looping tasks.
  • Task Return Data: Data returned back as result of executing a task on host.
  • Modules: Each task contains a module which performs some operation. Check here for in-built modules.

Playbook

  • Collection of plays in YML. Each play is a collection of tasks.
  • Execute a playbook using ansible-playbook command.
1
2
3
4
5
6
7
8
9
10
11
- name: Playbook for printing stuff
hosts: localhost

tasks:
- name: use debug module
debug:
msg: "print debug message here"

- name: use fail module
fail:
msg: "print error message here"

ansible-doc

ansible-doc utility installs with ansible. It is useful for getting help and reading about ansible modules.

1
$ ansible-doc debug

Terraform vs Ansible

This is the part I have been waiting for. Here is my conclusion.

Use terraform, if provisioning is the only need.

Use Ansible, if you are trying to manage already provisioned nodes.

When you want to provision + manage, go with any/both of them depending on your workflow.

So it is never Terraform vs Ansible, it is Terraform + Ansible. They could really agument each other and to attain this at an organisation level products like Shippable could help.

More

I guess there are lot more to explore and learn from. I will probably hint some topics for future exploration.